GeneJ 2011-04-16T21:22:44-07:00

What will it take to get BetterGEDCOM E&C; defined?

What will it take to get the BetterGEDCOM E&C part of the model defined?

Have we made any progress this first week?

(pulling this discussion over from the Developer's meeting page)

GeneJ 2011-04-18T07:11:49-07:00

@Adrian

You wrote, "allay your concerns."

As much as it might appear otherwise (I just hate being logged out), my concern is less about specifics, and more progress developing the materials by which some objective review can be made to support decision making about BetterGEDCOM E&C.

The comments in the opening to the discussion and on the related wiki page. See also Geir's comments:
http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37787742#37793206

Pushed for time.

PS So many fine point have been made in your postings and all the others in this thread. I hope later this week to be less pressed for time and in a better position to respond to each.

GeneJ 2011-04-18T07:48:49-07:00

@Adrian:

P.S. I raised concepts of "evidence persons" and "snippets/logic and reasoning" thinking we might consider a topical outline as an approach to developing the materials referred to just above.

I can think of a other approaches too.

There is a decision making process that will work for E&C--let's make a plan and get 'er done. --GJ

AdrianB38 2011-04-18T09:09:18-07:00

Sorry Gene - you've lost me when you refer to "more progress developing the materials by which some objective review can be made to support decision making about BetterGEDCOM E&C."

Objectively, what decision? Or, what review?

We're creating a requirements catalogue here. Decisions about the contents of requirements catalogues are made under several circumstances:
- when requirements contradict;
- when requirements fail a cost / benefit test;
- when requirements work against the strategic or tactical direction of the company, possibly as laid down in standards or legislation;
- when requirements cannot be developed to a meaningful conclusion;

And probably some more besides.

Hey - maybe this is the list you need?

If I run the E&C requirements against those criteria:
- when requirements contradict - I know of no contradiction, PROVIDING nothing goes into the E&C model that mandates the use of things like personas / evidence persons / etc. I have seen nothing like that yet - the E&C Model contains the C-only Model;
- when requirements fail a cost / benefit test - not relevant here;
- when requirements work against the strategic or tactical direction of the company, possibly as laid down in standards or legislation. The equivalent here could be the contradiction of the BCG standards - again PROVIDING the data model does not mandate specific ways of working, I know of no way that it works against them;
- when requirements cannot be developed to a meaningful conclusion - my belief, as an IT specialist (ex-specialist?) is that the E&C MODEL is, in essence done. Questions like "Where do the proof arguments go?" (very important questions) actually apply across the board and not just to E&C;

There is, and Louis may not care for me bringing this up, one requirement that might cause us issues, and that is
"Syntax06 - Define one way of doing a thing"
"BetterGEDCOM should define just one way of doing one thing"

It might be thought that having a BG data model that accommodates both Evidence & Conclusion Models AND Conclusion-only Models is in contradiction of Requirement Syntax06. I believe this is not the case. The 2 are not in themselves in contradiction since one lives inside the other. Nobody said anything about using BetterGEDCOM in only one way - simply that if something from one of those methods is entered into a BG-compatible database, then there should be only one way of recording the data item. One place to enter a citation. One place to enter a note. And so one place to enter a proof argument if it's a note, and another one place to enter it if it's a citation.

(Louis may not be convinced. Heck - I'm not sure I'm convinced. All I know is that we have to accommodate all sorts of genealogists)

I thought the original concern was about the amount of progress that had been made on E&C, not about the validity of the E&C requirement. If it is the latter, then maybe the phrases above will help you to assess the validity.

As far as progress goes, my advice to you all as an IT (ex-)professional is that the E&C model AS SUCH doesn't require much work on it other than some more words describing what happens when data is added or modified or deleted. Have we covered off all the circumstances?

There are other areas, such as the recording of research plans and logs, proof arguments, etc, and the integration of those with events, attributes and relationships in the rest of the database that are further behind. (Yeah, that's me mentioning "proof" again).

In summary, yes, I'd like to make a plan - but I'm not sure what this plan is supposed to do in relation to specific objectives.

GeneJ 2011-04-18T10:07:01-07:00

Hi Adrian:

In the last Developers Meeting we discussed the proposal that all other work on the Wiki (Requirements Catalog and EE/GPS Support, etc.) should stop until a decision had been made on E&C.

Exactly what form that decision was to take (my term) wasn't decided. Nor did we decide how to organize the effort about how to advance beyond sporadic discussions (my term) toward a decision.

http://bettergedcom.wikispaces.com/DevelopersMtgNotes11April2011

I won't say this as well as someone else might, but at the most practical level, it seemed decisions on E&C could have ramifications on other requirements. Likewise, progressing too far on other requirements might impinge on decisions about E&C.

So, we stopped the other development discussions that had been progressing--but the idea was we wouldn't be stopped forever.

Ala, my comment above, let's make a plan.

I know there are other points in your post. I'll try to connect after the meeting.

Humm... waiting for meeting to begin.

gthorud 2011-04-18T14:20:53-07:00

Just for the record.

I did not read the last part of this discussion before Tom left.

As I stated in the meeting, I don't think it was proper to start a discussion about IF E&C should be in BG, the question was HOW to proceed.

The result is now that some people will go work on other short term things, and the rest of us will try to help Mike on E&C. Unfortunately without Tom.

GeneJ 2011-04-18T14:37:38-07:00

Geir,

I haven't had a chance to review all of the comments in the thread; however, I sure didn't see it as "either-or" but what and how (decisions as opposed to decision).

GeneJ 2011-04-18T19:15:11-07:00

@ Adrian:

I know I owe you many responses.

But to your list, I would add:

Does a body of material exist to support a general understanding of what the E&C model is?

While applications will be responsible for their implementation of this model, genealogy is a much talked about. At the end of the day, users are not shy with their opinions.

The closest I can come to materials about the model is GenTech--but it's posted over and over again on the wiki that we should ignore it.

It doesn't help that for those who don't, the first part of GenTech says, "now that we see it, we don't like it." A little further into the report, its suggested that perhaps it won't benefit any users.

I know it's "not GenTech" because I've been told that. It's a two edged sword, though--if it's not GenTech, then where is the documentation, review and comment about "E&C."

Does this help?

AdrianB38 2011-04-19T03:00:39-07:00

"Does a body of material exist to support a general understanding of what the E&C model is?"
It's scattered throughout this Wiki. E.g. I wrote http://bettergedcom.wikispaces.com/evidence%20vs.%20conclusions

There's the discussions behind
http://bettergedcom.wikispaces.com/DeadEnds+Model

Obviously(?) the discussions behind
http://bettergedcom.wikispaces.com/Evidence+and+Conclusion+Process
(This illustrates a source of confusion in early Wiki days - the Evidence & Conclusion Process was eventually characterised as the process we go through using evidence to create conclusions. One we all go thru. The Evidence & Conclusion MODEL is about how that data gets stored in the program's database - specifically that you can always see the evidence input to the analysis stage and the conclusions output, and there is never a "destructive merge" of people's data.)

Searching for "evidence-person" will give several pages of results.

(I have to say that cleaning up the navigation column has meant it's now more difficult to find stuff!)

testuser42 2011-04-19T08:19:57-07:00

also see Tom's non-XML model definition:
http://deadendssoftware.com/DeadEndsModel.2.pdf
or version 1
http://deadendssoftware.com/DeadEndsDataModel.pdf

I have been able to understand the concept of E&C by reading these documents and the discussions here.
I'm not a native english speaker and not a programmer (but still a bit of a nerd) and I had not looked at a GEDCOM file before this project started.
I'm also only a hobby genealogist, maybe just past the beginner stage, who's only ever read half of one book about genealogy.

So, if even I can get E&C, I believe others should be able to get it, too. Nobody here is stupid.
Yes, it involves things that look like code - but that's just a very precise language.

It will take time to grasp the code-y stuff, but until then, please, believe the techy types when they all say that E&C does not do any harm for any way of doing genealogy. The opposite is true: only an E&C Model supports all the ways of doing genealogy.
Please don't be afraid.

testuser42 2011-04-19T08:34:55-07:00

Mike asked about the motivations of people - are we building a model for a transfer file or a new genealogy application?
That's a good question, and we should always check our decisions and ideas against this question. But Adrian has answered why these two often intermingle: Some of us are not satisfied with our software, and we see the problem boils down to GEDCOM's way of organizing data. A BG would free software developers so they can do more powerful things and don't have to let GEDCOM compatibility slow them down.

Personally, I stumbled on this project because I was a little frustrated with my software, and wanted to see how to get my data into another. I found out the transfer was never going to be easy, and it's not only my software's fault.
I also grew impatient with the way I could record my research in my software. With some research subjects, I'm at the "chasm", or it feels like that. I want to record all the pieces of data and their sources and what I think about both data and sources. I also want my "working hypotheses" stored and easily understood. I want to be more scientific, and have all that preserved for other softwares or other researchers.
Most of this can be done in a really good software, maybe such software already exists. But I know that without a up-to-date powerful replacement for GEDCOM, I would be stuck with that software forever, the data would never transfer if I wanted to do that. I don't like being held hostage...

So now I pray that we manage to keep our cool and work out a BetterGEDCOM that is so great that developers will want to use it :)

testuser42 2011-04-19T09:01:19-07:00

I really hope we've not chased Tom away for good.
He's the one who has thought longest about data model solutions to genealogy problems. The areas that he hasn't thought about much (like research logs or citation templates) are already being filled by the good work others are doing here. I felt we were getting somewhere good.

Do we still have a chance of making this project work? Defining a data model is after all a very technical project, and it's not good to chase away any techy person that's pitching in. Luckily, Tom documented his model quite well and explained his ideas repeatedly so that the basics have become clear. The devil is often in the details, and we may have to go after the details without Tom's direct input.

Also, Tom is one of the small application developers that in my view would be most likely to adapt the new BetterGedcom. Luis and Mike will hopefully support it, and maybe GRAMPS if we've not scared them with our bickering. Some other developers have checked in from time to time, like Ben Sayer (Lineascope) and Christoffer Owe (Cosoft). Maybe they would be interested, too. So that would be a small nucleus of software that can then start to "radiate" the benefits of BetterGedcom.

gthorud 2011-04-19T15:20:47-07:00

Here are some thoughts on how to proceede. But I am very open to other ideas.

Stuff that could go into the page as early as possible

- Create a drawing of a E&C tree (just saw testuser posting something)

- Describe the basic functionality

- Describe that Evidence Persons is an optional feature, every program does not need to change (one of possibly several requirements)

- Links to old discussions/pages, probably with a short statement about what has been discussed

The page could also have an issues list (random sequence below)

- Rules for entering info from sources (these are probably clear already)

- Rules for entering data in a conclusion person and where to put the conclusion statement – why are these the same (have been many discussions, probably 2 views re. copy info upwards or not - what to do in various situations eg. conflicting evidence)

- Relation to a source/citation model

- Relations to an administrative model

- What are the differences in what the Gentech model can do and what a multilevel E&C model can do. What are the benefits/drawbacks of each model?

- Interworking with other models – both ways - Gentech/NFS and existing Conclusion, one level, systems (possible problem with compressing eg a 3-4+ level tree into a 1 level one because the context represented by a level could be lost if we are not carefull)

- Will it work if users create Evidence Persons for some Persons and only Conclusion persons for others (may not be a problem?)

- How can the model be used (and perhaps adapted) to support various ways of working, based on how data would be recorded for each method.

- What about E&C for other things than persons.

It would perhaps be possible to discuss these in separate discussions. Also, some of the issues are alredy discussed at length - but we need to sum up.

Please, start throwing stones at the above! - or add to it.

GeneJ 2011-04-17T19:35:21-07:00

Mike,

"...are really trying to design a new genealogy program instead of a new import/export data model."

You would not be alone in that concern.

GeneJ 2011-04-17T20:36:15-07:00

@Tom wrote, "...most of us are interested in achieving the advantages of record-based methodology, and to enable this methodology our evidence must be structured into records that computers can process."

I just Googled, (Genealogy "records based methodology"), without the parens.

There were 13 returns. Three of those are postings to BetterGEDCOM. There are several "Keeping Families of heroin addicts together,"
and a few other public health entries.

Now, I've read Ancestry Insider's "Chasm" entry several times. I've read you post about it on BetterGEDCOM several times.

I'm not sure why you don't believe I've cross the "Chasm."

I fail to see how labeling me so brings us closer to the materials about E&C by which BetterGEDCOM can make an objective review. --GJ

Build a BetterGEDCOM Blog:
What is research: Understanding the Kaleidoscope
http://bettergedcom.blogspot.com/2010/12/what-is-research-having-fun-with-body.html

Build a BetterGEDCOM Blog:
What is research: Working with documents about a c1815 estate.
http://bettergedcom.blogspot.com/2010/12/what-is-research-working-with-original.html

Build a BetterGEDCOM Blog:
What is research: Outlining the contents of an American Revolutionary War pension file.
http://bettergedcom.blogspot.com/2010/12/what-is-research-outlining-contents-of.html

They Came Before: Mixing it up: the indirect evidence challenge
http://theycamebefore.blogspot.com/2010/11/mixing-it-up-indirect-evidence.html

They Came Before: One Spoonful at a time ...
http://theycamebefore.blogspot.com/2010/12/one-spoonful-at-time-1816-news-item.html

They Came Before: Any time you connect a Miller ...
http://theycamebefore.blogspot.com/2010/12/any-time-you-connect-miller-its-good.html

They Came Before: Just in time for Christmas ...
http://theycamebefore.blogspot.com/2010/12/just-in-time-for-christmas-tombstone-of.html

They Came Before: Bits and Breadcrumbs
http://theycamebefore.blogspot.com/2010/10/bits-rev-michael8-merrill-preston.html

ttwetmore 2011-04-17T21:40:34-07:00

GeneJ,

We are getting no where. You quote link after link but steadfastly refuse to answer the simplest of questions. I don't think you understand what I've been talking about, and I don't think you care to. I see no point in continuing..

Tom Wetmore

mmartineau 2011-04-17T23:06:27-07:00

GeneJ,

Thanks for all the links to previous discussions you hunted down earlier. Reading those discussions helped me immensely in understanding what has gone on before.

hrworth 2011-04-18T01:20:34-07:00

Tom,

I have a real concern about the BetterGEDCOM project being "done" any time soon, if we continue to try to turn the Titanic around, with the iceberg in site.

The "Titanic" being Evidence Records and a collection of Facts / Events with their associated Citations. What you have been proposing is a "new way" of doing research (new being relative) and the recording of what we find.

Evidence Person, Conclusion Person, etc, are may be nice to know and understand and probably should be going there, but WHY can't we address the specific issues at hand.

To me, the biggest issue, NOW, for this project, is to address the sharing of Source Information and Citation Information between two application.

That does not mean that we shouldn't be looking ahead, in fact we should. BUT this project needs to define the "easy stuff", the stuff that is Broken, get the vendors on board, then get them to help US move in a new direction. I think what you have been proposing all along, is a New Direction.

WE can even agree on the sharing of source information and citation information.

Russ

AdrianB38 2011-04-18T03:04:19-07:00

Mike reasonably asks "I have to wonder how many people on this wiki are really trying to design a new genealogy program instead of a new import / export data model" and Gene responds "You would not be alone in that concern".

As someone who has written a page about process, perhaps I should attempt to explain it from my viewpoint.

Modelling the real world of families, people, locations, organisations, etc, is "easy" because we can look away from our desks and PCs and see how the things work in real life. Hence, we can talk easily about the relationships between these things.

When, however, we talk about the world of genealogical research, we are not talking about things that are so common and obvious. Sources, Repositories and Citations are fairly easy because they are universal so we all have a common understanding. Except even there, life gets tricky when we ask what the content of a citation is for an event or attribute.

When we get into concepts like Research Log, Proof Argument / Summary, etc, where not everybody has an application that covers these and those application that do, cover it differently, then we need to sort out what those terms mean. And the simplest way of doing that (for me) is to describe a process for how those things are used.

So - I write a process to determine what the data is and I believe that is very necessary. The art, of course, as you both suggest, is knowing when to stop refining the process and turning it into procedures, algorithms and programs.

hrworth 2011-04-18T03:18:03-07:00

Adrain,

I appreciate your process and all of the work that you have done.

I think we need to do two things. 1) Identify and attempt to define what we know is broken, and define how to fix that, then 2) bring in the "new stuff". The Research Log, Proof Argument, the wonderful work that you and Tom have been doing.

If I have read the Second Life comment, the comments from Mike and GeneJ, as well as my own, I think we need to pause for a moment and see if we can attack and resolve the issues we know about, keeping in mind the new stuff, and try to help educate us End Users on why this new stuff is so important.

As you may know, this project started because two end users could not share, successfully, research information, more specifically Sources and Citations. We have plenty of examples on the Blog.

We more importantly need to get genealogy software developers to help us fix this problem.

My thought is if we get them hooked (on board) with fixing the Source and Citation issue (for example), they may be willing to come to the table to address the new research techniques that you all have been proposing.

Just a thought.

Russ

ttwetmore 2011-04-18T03:30:07-07:00

Russ,

No worries. I've given up. If you've noticed all the people responding, however, you'll see that most people are thoroughly on board the move to add evidence. If we asked them whether we have to turn the Titanic around to get there I doubt many would agree.

You and GeneJ are opposed to the idea. You two are also the least computer savvy of the people actively participating here. I believe the main problems with the effort are trying to run it with a wiki and having it lead by people with no technical background.

Better GEDCOM has noble goals and I hope it will be a success.

Tom Wetmore

AdrianB38 2011-04-18T03:39:15-07:00

Russ said "To me, the biggest issue, NOW, for this project, is to address the sharing of Source Information and Citation Information between two application"

I am torn between the pragmatic short-term concern of sharing what we have already, including citation data, and the long-term view of enabling a data model to cope with the Evidence & Conclusion Model.

However, having thought long and hard and tested various concepts in the pages of this Wiki, let me give you my opinion of the relative difficulties:
- implementing the Evidence & Conclusion Model in a replacement for GEDCOM is EASY.
- implementing citations in a replacement for GEDCOM that can be exchanged between applications and printed out consistently in whatever style the recipient wants, is HARD.

Why is the latter hard? Because any initial analysis of the citation formats / templates / whatever in ESM's EE book starts with HUNDREDS of different elements. Somehow, I am (naively?) convinced, several hundred of those can be reduced to a couple of dozen variants on Author, Date, etc. Which still leaves another several hundred which may (or may not, I just don't know) be so dependent on the type of source document they come from, that they will never be reduced in number. And of course, the number of types of source document increases all the time.

Somehow, we have to concoct a model that enables the transfer of citation data - BUT the sheer volume and volatility of stuff makes it HARD to analyse, to design a solution and to update that solution when new documents come along. There might be a simpler way of doing it than hacking it all out item by item - in fact, there has to be. But what that method is, I don't know yet.

Whereas, as I say, implementing the Evidence & Conclusion Model in a replacement for GEDCOM is EASY. It really is. LDS have already done it, albeit in a truncated 2-level only form. Pretty much all it needs is the pointers from one person to another.

BUT BUT BUT, don't imagine that I'm also arguing that implementing the Evidence & Conclusion Model in an application is as easy. It won't be - there a whole raft of extra navigation that has to be put into the software to go up and down the conclusion tree. Fortunately, this is not something we have to concern ourselves with. If the developers were starting from fresh it would probably be fairly straightforward but as they'll all want to modify their existing software, it could be dodgy.

Actually - I'll tell you what I think the biggest issue is - it's not the Evidence & Conclusion Model, it's not creating a format for citations. It's answering this question - WHY should the vendors come on board?

hrworth 2011-04-18T03:58:45-07:00

Adrian,

I guess Tom is right. I don't know what I am talking about.

I am NOT saying that the Sources and Citation issue is Easy, but what I AM saying, that if you look at Evidence Explained!, you will see a series of Fields (field names), right? They are strung together.

Don't most databases understand fields and how to string them together? My reading of a "First (Full) Reference Note" is a series of Fields that are in a certain sequence. If you were to look at Roots Magic, for example, you can see those field names. Other programs don't present those field names, but behind the screens are those (probably) same field names.

<fieldname> text </fieldname> in what ever format developers want, like levels in the current GEDCOM or some other set of rules, could be generated and received for presentation to the other end user.

I do understand, this is not as easy as it sounds.

Why should we have vendors on board? May be that is the real issue here. IF the Vendors are NOT on board with the project, how will I, a stupid end user, ever going to see a successful exchange of information? If the vendors are not on board, who will transform the data in my software to a BetterGEDCOM file, and who will recover that data when it gets to the other end.

What am I missing?

Thank you,

Russ

AdrianB38 2011-04-18T04:24:05-07:00

Russ - you said "I am NOT saying that the Sources and Citation issue is Easy". Good. My point for bringing up the relative difficulty of Citation vs. Evidence & Conclusion Model is simply to indicate that discarding the E&C Model wouldn't actually save us anything in the way or time or workload.

Note that all this is based on the idea that using E&C is an OPTION in the software. Even I would probably only use it occasionally, e.g., since I'm so far down the road of conclusion-only people - I can no longer easily see the logic that lead me to believe that this guy over here being baptised, is the same as that guy over there being married.

Your idea of concocting some sort of template for citations is probably the right way to go but there is a balance to be struck in there between sticking stuff, any stuff, into a template where it can be transferred but not understood (because there's no agreed, world-wide name or definition), and creating an official BG description for an item where the item is so important that all software needs to see an explicit definition of it.

And yes, I would agree with the view that getting vendors on board IS the biggest issue. Without them, as you don't quite say, neither the geekiest end-user nor the end-user with no IT knowledge, will ever see successful data transfer.

And I still don't see any compelling reason for the big boys, who are software guys, not genealogists, to come on board.

AdrianB38 2011-04-18T05:16:47-07:00

Gene - let me try and allay your concerns.

Firstly, the general philosophical point is that the Data Model that we come up with for BG MUST allow for all sorts of methods of working. If we put all sorts of mandatory relationships or processes in, for instance, then the new-starters in genealogy are going to get so confused that they'll pack up and go and watch the ball-game. And even the experts have different ways of working <grin>.

You said, a while ago: "I don't know that all of us have agreed to have the source system as the clearing house for the record capture". By reason of your words "software source system creat[ing] a master source, then I decide the further dispensation of the record..." I interpret the phrase "source system" as meaning those screens and routines within your genealogy app that deal with the entry of sources. Then you say "I don't see myself creating an evidence person". That's fair enough. The whole point of the E&C Model _and_ the philosophy is that anyone who wants to create an evidence person (or persona in LDS nFS speak) from a source record, should be able to do so. And anyone who doesn't want to, doesn't need to.

So while we certainly haven't all agreed that inputting a source is in effect entering a clearing house, neither should we need to, if it's optional.

You also referred to Tom's example having "the information snippet and logic and reasoning being recorded in the database proper" and said further "If the source system is not the clearing house for these snippets and logic/reasoning, I object"

I interpret this to mean Tom's example having the proof argument / proof summary / similar justification being recorded against the event (or whatever it was) (as a note I think it was) and that you wanted it with the source and citation data.

Again, I think the whole point of the philosophy is that someone should be able to put the proof argument / proof summary / similar wherever they want. Tom happens to think it works as a note at the appropriate point. You happen to think it works best as part of the citation (or as a link from within the citation). Me - I don't want to adopt either of your solutions because (a) I'm an analytical guy that wants to split my entities apart, which means splitting my proof argument etc out from the citation but also (b) I don't see enough pointers and links in Tom's notes to enable me to get back to the very beginning of the research chain. (Sure, I can read it and work it out backwards but that's boring.)

To make all 3 views work, we actually need a data model that includes sources, citations, research notes, proof arguments, shared notes and probably loads more. Then we can all press whatever button we like. In particular, you can use your "source system" (i.e. the source and citation data) as the clearing house for your logic and reasoning, and Tom can use notes, and I can use these other screens. If they exist. And by the way, this issue actually has nothing to do with the Evidence & Conclusion Model and everything to do with where we store "proof" data etc, which applies in all methods of working.

Finally the idea that we need to "vet the evidence person concept before we make it the clearing house for anything". Again, similar thing applies - if it's in the Data Model, you don't have to use it. You can control stuff the way you always have.

In summary, the E&C Model contains within it the ability to work in the old-fashioned way so, providing we don't make anything stupid mandatory, _everyone_ can carry on how they want.

Bear in mind that I am talking about a Data Model - what happens in an application using the Evidence & Conclusion Data Model is not something I can promise you anything about. The arguments about working methods are ones you'll need to have with the writers of the software and you do have a choice there. (Assuming any of them are interested in change).

mmartineau 2011-04-17T10:18:32-07:00

The post referred to here:

https://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37787542#37787548

Can be found here:

http://bettergedcom.wikispaces.com/message/view/Research+Process%2C+Evidence+%26+GPS/37632782#37734938

mmartineau 2011-04-17T10:35:31-07:00

Tom,

Excellent example!

On an unrelated note, when you use the [[code]] be sure to put in forced \n (line feeds), otherwise long lines in the [[code]] tags stretch out the entire thread.

AdrianB38 2011-04-17T11:11:49-07:00

Tom,
I think that works for me. Alter the minimum amount possible and just let the software float up to the top of the tree. Or sink down, depending on where you start.

I must admit I'd not noticed before that the top person in the tree possibly only consists of links pointing down. That results in a lot less copying than I had in mind because I thought the merged person would include all the events and attributes.

In which case - and I think you did mention this somewhere but... - how would you suppress one of the lower events or attributes if your manual decision was that the birth date off evidence-1 was the "proper" date so should be taken and suppress that off evidence-2 (as distinct from keeping both as alternatives)?

ttwetmore 2011-04-17T11:59:33-07:00

Adrian,

The $64K question. You have to indicate your preferred attributes somehow. And you only have to do this in the cases where there is duplication or conflict.

One solution is to "copy up" the ones you want.

Another solution would be to use a scheme like:

0 @I22@ INDI
  1 NAME @I2@ <<-- take the name from evidence person 2
  1 BIRT @I2@ <<-- take the birth event from evidence person 2
    2 DATE @I3@ <<-- except override the birth date only and take it from evidence person 3 
  1 DEAT @I4@ <<-- take all death information from evidence person 4

Again, the user wouldn't have to do this ugly stuff by hand or even see it. The user interface would make it simple. For example, the NewFamilySearch tree has exactly this functionality. When you use NewFamilySearch you can combine any number of persona records into a person record, so person records can contain gobs of redundant and conflicting information. There is a user interface in NewFamilySearch that gives you pop-up menus for every attribute, showing the multitude of values that come from the various personas. You choose the preferred value in those menus. The chosen values are the ones displayed for the person, while all the other ones remain available but tucked out of the way. The NewFamilySearch application is like a wiki in the sense that after you make your preferred selections, anyone else can change them. They can even redistributed the personas to different persons if they choose.

You can use your imagination to see some of the ramifications of this approach. For example, when citations are generated for a person, that citation can be customized to only cover the sources that you have chosen for the attributes to be displayed. You would never have to worry about "junk" that got added to a person showing up in any of your reports and so on.

Tom Wetmore

mmartineau 2011-04-17T13:06:11-07:00

"Representative project/examples? Case study/studies?"

GeneJ, I'm not sure if you are referring to my earlier statement of "I would like to see more concrete examples like this because it helps me better understand EXACTLY what the person is trying to say. Otherwise, as others have previously stated, it's easy to misunderstand what they mean." or not, but if you are, I would like to see both, case studies and project examples. I also would like to see those case studies and project examples demonstrated within the scope of the E&C model Tom is proposing.

I wonder if you would put together a case study and then if we can get someone (Tom?) to show how to represent the data in the E&C model proposed. You may have already explained a case study on the wiki somewhere. If so, put a link in so we can find it.

I really think what Tom proposes can work in all cases with a few tweaks here and there, so maybe if we can get an actual example put together, it will alleviate any misgivings.

AdrianB38 2011-04-17T13:09:33-07:00

I think I prefer the "copy up" the ones I want approach because this gives the flexibility to actually concoct a different value again - for instance "Charles E Taylor" on one, "Edward Taylor" on another, combines to form "Charles Edward Taylor". Yes, if you really, really wanted to describe each bit of the name, you could do it but I've spent too long writing Visual Basic commands to hack phrases around to want to go there again.

Of course, your 2nd scheme could also allow a similar means to create the different value but then it has 2 ways to record a value (actual or cross reference), which seems unnecessarily complex.

We'd also need a means to suppress a value, e.g.
1 DEAT REMOVE
to remove the death event from someone who isn't dead.

And... not sure how I'd do this yet - if you had a list of occupations on 1 evidence person plus another list of occupations on the 2nd evidence person and you wanted to suppress just some from one... Maybe you'd need to have a rule that says "If there is copy-up, then what you now have is the full list" - which seems sensibly simple anyway.

GeneJ 2011-04-17T15:08:16-07:00

@TestUser:

"[why do you think]...Tom's proposal would separate a snippet from it's authors identity, or how it would separate the "snippet" from a delayed birth record from the information that is was delayed? ... I may be missing something, but I don't see a reason to be afraid for this to happen?"

I think Tom and I have hashed through this over time. (Ala, the information IN a source vs information ABOUT a source.)

To find these discussion references just takes forever.

http://bettergedcom.wikispaces.com/message/view/Glossary+Of+Terms/35106830#35337008

http://bettergedcom.wikispaces.com/message/view/Glossary+Of+Terms/35106830#35338108

"I extract genealogical information from that source and call it evidence..."

We took an oversimplified example way too far here:

Direct Model Support for the Evidence and Conclusion Process
http://bettergedcom.wikispaces.com/message/view/Evidence+and+Conclusion+Process/35338758#35623816

Please let me know if that didn't answer your question.

Separately and perhaps a little indirect... Say I'm talking to two people--one being a data person and the other better characterized as a user :)--and I mention the "GenTech" like E&C (don't shoot me again, Tom)--the data person's face lights up, almost like they feel finally at home.

If I continue to describe the "evidence person" concept, the user's got a furrowed brow. Their eyes gloss over about the time the data guy starts to drool.

If I try to talk about why? The user says, "I can do all that without this thing. When I explained all the sorting benefits the other night, one person sent me a note-"Oh, I agree, but I heard Microsoft has this new product called Excel."
More than one person has called it "old technology" and ask me why the data side doesn't "get it."

(See, they might want to shoot me too.)

GeneJ 2011-04-17T16:43:17-07:00

Adrian wrote, "[what do you mean by...] source system as the clearing house for the record capture.."

I want to grab the record and the online citation and have my software source system create a "master source." Then I decide the further dispensation of the record. I'd probably always send it to the Admin-Research (research log). Occasionally it might slide right into someone's individual record.

As I 'splained in the last meeting, I don't see myself creating an evidence person.

Adrian wrote, "[what did you mean by...] the information snippet and logic and reasoning being recorded in the database proper"

Sorry, I don't have a good way of distinguishing between the Admin-Research area, the individual, places and ships section and the "source structure" sections of the database. I'm sure y'all will let me know.

Please let me know if that did not answer your question.

GeneJ 2011-04-17T16:59:15-07:00

@Tom,

You wrote, "I don't understand ...snippets"
Humm... I think it's what you call evidence. Does the response above to testuser help?

You wrote, "the clearing house for the record capture ... I don't know what this mean."
Please see the statement in context.
http://bettergedcom.wikispaces.com/message/view/Research+Process%2C+Evidence+%26+GPS/37632782?o=20#37739724
Perhaps also Louis' comment here:
http://bettergedcom.wikispaces.com/message/view/Research+Process%2C+Evidence+%26+GPS/37632782?o=20#37739724
With the link to:
http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32692308
And the discussion here...
http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32692308

You wrote, "[what did you mean by]..database proper."
See my above comment to Adrian, please. We don't have definitions for everything we'd necessarily like to have defined.

More to write .. have a meeting now. --GJ

ttwetmore 2011-04-17T18:37:32-07:00

GeneJ,

This doesn't answer my questions, but it's not important. It's clear from your answer to Adrian and your general skepticism about evidence records in general, that you don't need them, you don't want them, you believe they have nothing to offer. Your definition of the E&C process is to collect records, and when you decide they apply to a person of interest, you add facts from the records to a person in your database. This is the person-based methodology. But most of us are interested in achieving the advantages of record-based methodology, and to enable this methodology our evidence must be structured into records that computers can process.

Tom Wetmore

mmartineau 2011-04-17T18:37:34-07:00

"As I 'splained in the last meeting, I don't see myself creating an evidence person"

It sounds like you are referring to what a genealogy application will present and allow you to do, not do, require etc in the user interface. The underlying data model doesn't have to be that coupled to the user interface of the application. Just because we talk about "evidence" persons, doesn't mean that a software application has to present it to you the user in that way. In fact current genealogy programs can continue to operate the exact same as they do now. The only difference would be in what format the data is exported/imported.

An example is the Family Pursuit internal data model. It's data model has no concept of a family and yet the application can display families just like any other genealogy program. You can add and remove children/parents from a family just like any other genealogy program. When data is exported via GEDCOM is is put into families and exported that way, but that does not mean its stored that way internally.

The advantage of an E&C model is that not only will it support the old paradigm of data storage/transfer, but it will open up the possibilities of future genealogy software that can take advantage of the approach and be able to transfer that data to other genealogy software applications that also support this new paradigm.

This may not be anything new to you or others, but I just wanted to throw it out there just in case.

mmartineau 2011-04-17T19:01:53-07:00

In fact, now that I'm thinking about it, I have to wonder how many people on this wiki are really trying to design a new genealogy program instead of a new import/export data model. Thinking back on several discussions now I wonder if that is the true source of disagreements. According to Goals page on this wiki -

"The Goal of the BetterGEDCOM Project is:

BetterGEDCOM will be a file format for the exchange and long-term storage of genealogical data.
It will be more comprehensive than existing formats and so become the format of choice."

This in my mind clearly states the purpose is to define an "file format for the exchange ... of genealogical data", not a genealogical research process - which would be what a genealogy program does.

GeneJ 2011-04-16T21:23:07-07:00

http://bettergedcom.wikispaces.com/message/view/Research+Process%2C+Evidence+%26+GPS/37632782?o=20#37776776

Mike writes, "On Friday 8:26am (Mountain Time), Tom presented a concrete solution that elegantly solves the problem Adrian introduced. Does anyone have a reason why this model does not solve the problem? I would like to see more concrete examples like this because it helps me better understand EXACTLY what the person is trying to say. Otherwise, as others have previously stated, it's easy to misunderstand what they mean."

I don't know that all of us have agree to have the source system as the clearing house for the record capture.

Tom's example was only enabling a citation to be "created"--I think in his example he still had the information snippet and logic and reasoning being recorded in the database proper.

If the source system is not the clearing house for these snippets and logic/reasoning, I object.

You wanted examples.

(1) I do not believe information in the source is more important than information about the source. How can you separate a snippet from it's authors identity as though that snippet has some separate value. Ditto, how can you separate a snippet from the identity of its "source of the source" and believe that snippet has some separate value. How can you separate the "snippet" from a delayed birth record from the information that is was delayed. We know that sources come in all flavors. It seems to me you're trying to carefully package all the fruit in boxes and then strip the labels off.
Please let me know if you need a list of authorities for the above.

(2) The "evidence person" concept is not vetted from my perspective. Surely we have to vet the evidence person concept before we make it the clearing house for anything.

GeneJ 2011-04-17T00:33:37-07:00

Representative project/examples? Case study/studies?

testuser42 2011-04-17T04:06:19-07:00

Hi Gene,

could you explain to me why you think that Tom's proposal would separate a snippet from it's authors identity, or how it would separate the "snippet" from a delayed birth record from the information that is was delayed?
I may be missing something, but I don't see a reason to be afraid for this to happen?

AdrianB38 2011-04-17T05:26:39-07:00

Gene - re "source system as the clearing house for the record capture" and "the information snippet and logic and reasoning being recorded in the database proper"

Can you please explain what you mean by those 2 phrases? I might take a guess but, as we know, my guessing success rate isn't good.

In particular, what's "the database proper"? Because in my IT-based view of the world, the database-proper represents all the stuff stored in or by the application, so would include people, places, families, etc (call that the real-world-side if you like); the source and repository records; and the research logs, tasks, proof statements, etc. (I've split those into 3 just in case it's useful to what you mean:
- real world;
- internal to the study of genealogy (as per current GEDCOM)
- internal to the study of genealogy (excluded from current GEDCOM) )

ttwetmore 2011-04-17T06:48:05-07:00

I don't understand GeneJ's comments about snippets and my approach very well, but I will try to respond.

"I don't know that all of us have agree to have the source system as the clearing house for the record capture."

I don't know what this means. We record sources and we record what we find in sources. We add notes and thoughts. Then we use what we find to decide who was who and why. We record why we think what we think and we put those thoughts in proper places. This is the GPS fully supported. Why do we need to worry about a "source system as a clearing house" before we talk about this? Why can't we just decide on the model, the entities it contains, how they relate to one another, and what kind of information we put into each one?

"Tom's example was only enabling a citation to be "created"--I think in his example he still had the information snippet and logic and reasoning being recorded in the database proper."

I have everything stored in the database. For me that's the point in having a genealogical system. My source records are in there, my evidence records are in there (I think they are combination of your citations and now snippets), and my notes and reasoning and conclusions are in there. And they link and relate to one another exactly as logic and common sense demand and exactly as required by the GPS.

"If the source system is not the clearing house for these snippets and logic/reasoning, I object."

This implies to me that you think of your "source system" and your "database proper" as different things. Does this mean you think Better GEDCOM needs two separate "sub-systems" -- the part you use when collecting evidence and reasoning about it -- and the part you use once you make your decisions and you want to build family groups and pedigrees? I don't know this for sure, because you bring up and use new terms without definitions -- in this case snippet, source system, and database proper. Doing this would cripple our genealogical applications to point that they would only be able to do what, well, what applications of today can do.

In the DeadEnds model, the "source system" and the "database proper" are combined together into a single model and all the records defined by that model are stored in the same database. Is this the basis of your objection? You think they must be kept separate and independent? The "source system" is, I can only guess for you, the parts of the model that support the records-based methodology (repositories, sources, evidence, citations), and the "database proper" is the part that supports person-based methodology. And for some reason you want to keep them separate? I don't understand why anyone would want that.

"You wanted examples.

"(1) I do not believe information in the source is more important than information about the source. How can you separate a snippet from it's authors identity as though that snippet has some separate value. Ditto, how can you separate a snippet from the identity of its "source of the source" and believe that snippet has some separate value. How can you separate the "snippet" from a delayed birth record from the information that is was delayed. We know that sources come in all flavors. It seems to me you're trying to carefully package all the fruit in boxes and then strip the labels off.
Please let me know if you need a list of authorities for the above."

There is nothing in the model that makes one kind of information more important than another. Every item of information is linked to the items it depends on. You may misunderstand the notion of separation in a modeling or a database sense. Your notion of separation seems to be one of "you can't there from here", whereas the modeling notion of separation is one of clarifying and implementing the proper relationships between things. Yes, an evidence person is "different" from the the source record that defines where it came from, but the evidence persons link to those source records through a relationship. They are different things, as they must be for the purposes of modeling and computation, but they cannot be called separate. You have to think of these databases, not as filled with millions of unrelated and different types of records, but as a network of objects in which every record may be related to and linked to many other records. Take for example a "conclusion person". It is linked to every evidence person that a researcher decided represents that real person. Those evidence persons contain all the citation detail and other notes recorded for them, and they also link to the source records they came from. Thus, the software, having access to the conclusion person, also has access to all the evidence, citation info, notes, thoughts, conclusions the researcher has brought together. There is no distinction between a source system clearing house or a database proper; we have a single unified database based on a single unified model that has the entities and relationships needed to support the entire GPS.

"(2) The "evidence person" concept is not vetted from my perspective. Surely we have to vet the evidence person concept before we make it the clearing house for anything."

The evidence person concept has been fully vetted many times as the key addition to genealogical systems that work in the records-based paradigm. In a recent response to you I described three computing systems that deal directly with the records-based world, and all three of them used this concept as the key to their implementations. When the same concept is independently "invented" by three different teams, as the core concept they need to implement records-based handling of person-based data, it is hard to make an argument that the concept has not been vetted. And beyond the three working applications I described, the event person is also a key concept in the GenTech model, where it is called the persona. Even though I don't like the GenTech model, it is hard not to say that the evidence person record was not vetted, as a fourth example, by them.

Tom Wetmore

AdrianB38 2011-04-17T09:03:13-07:00

Re "On Friday 8:26am (Mountain Time), Tom presented a concrete solution that elegantly solves the problem Adrian introduced. Does anyone have a reason why this model does not solve the problem?"

(This was in the discussion "The Missing Link - a new entity type or a new type of source?" for page "Research Process, Evidence & GPS" )

(Incidentally - that's Friday, 3:26 pm on my screen, so proving this Wiki stores a standard time and translates it to my local time when displaying it. Which I kinda thought was happening - unless some of you guys really are working away at 5 a.m.)

The issue referred to is basically how to represent the justification for merging 2 personas / evidence people. At least, I think that's what it was about!

What my page "Research Process, Evidence & GPS" is telling me is that the output things associated with the research process are entities. A proof argument / proof summary / whatever, might explain why person X married in 1850 and another person of the same name baptised in 1825, really are judged to be the same person. (It might be a link to an argument whose text is elsewhere or it might be the full argument or...). Whatever it is, this "proof" is an entity in its own right. And I'll have lots of these in my database.

Now, while it's an entity in its own right, that doesn't mean I would expect the "proof" entity type to be physically separate from the research goal entity type to be physically separate from the research log entry entity type etc.

It is possible that these could all be sub-types of one generic entity table in the database (e.g. "Research Item"). It is even possible these could be sub-types of the Source entity type or the ... - I just haven't sorted it out yet.

So - I don't really, to be unhelpful, yet have a fully formed opinion whether Tom's proposal of a structure like this:
0 @I7@ INDI
1 INDI @I4@ <<-- the conclusion person he created.
1 INDI @I9@ <<-- the conclusion person for the Nova Scotia persons.
1 SOUR @I666@ <<-- it's him making the conclusion again
2 TEXT ... <<-- his words on why these two persons are probably the same

... would satisfy me. However, I do have concerns. My major concern is that this is all text - I don't see how to navigate back from the "words on why these two persons are probably the same" to the entities containing the research details that went into providing this conclusion. I would suggest that there needs to be a cross reference to point to the "proof" entity.

I am loath to create an extra entity type to represent that "proof" and link to it when means already exist to point back to something - e.g. to a Shared Note or to a Source.

One could create a Source to hold the text of the "proof". I think there are 2 issues with this - firstly many people cannot accept the concept of a Source created "inside" the system (as a proof-source would be) and imagine a source must exist somewhere out in the real world. The concept of an internally generated source doesn't worry me but as a mathematician I'm used to iterations and generalisations (e.g. a Set of Sets). However, the 2nd issue of using a source record to physically contain a proof argument, is that a conventional proof argument will contain text with citations for bibliography, footnotes, etc. None of this typically appears in the text linked with a source. (Other items may point to the repository,etc. for the source but that's different from a whole series of citation references inside text.)

I am, therefore, inclined towards creating a sub-type of the Shared Note entity (I call it 'Shared Note' meaning it's the one equivalent to the Level 0 type of note in GEDCOM, not the in-line note.) This (or these) sub-types of the Shared Note entity could be created to contain the research notes and - interestingly - should therefore appear as visible notes in a program that did not implement research notes, logs, etc., rather than be rejected. (Updating them would be very dangerous but at least they would be visible).

Whether or not all research "things" could act as sub-types of Shared Notes, I personally have no idea since I've not yet sorted them all out in my head. But that's how I'm thinking about recording the justification for merging 2 evidence people / personas.

BUT BUT BUT... The topic of how to point back to research notes and proof arguments is one that applies to both the Evidence & Conclusion Model and the conventional Conclusion-only Model. So whether you agree with my ideas on how to point back to research or not, does NOT impact on how to progress the Evidence & Conclusion Model.

AdrianB38 2011-04-17T09:41:30-07:00

But what does impact on how to progress the Evidence & Conclusion Model...

I am not sure if these topics have been fully discussed or not. (If they have been, feel free to contradict yourself - I'll never know).

In general terms, I suspect that a description of the Evidence & Conclusion Model will have to include descriptions of how to update the entities concerned since the "methods" (to use an IT term) will not be obvious as the Evidence & Conclusion Model is not part of the real world, but part of the world of genealogy where each of us may have a different methodology. A couple of these update methods that need to be defined (with examples) are:
- how to update (or not) a family (otherwise unaltered) containing a person whose records have been superseded by a higher level person in the hierarchy
- confirmation that evidence and conclusion applies to more than just persons. And if so, how do we deal with an update of a place when that place is referenced across the database?
- how to update a family when we are adding new data about the families' event and therefore creating a new conclusion family? And what about the people who are members of the family?

These are probably just aspects of the same thing or can all be solved by the same ideas.

Example 1 - updating a group. Suppose I have a source record for the "Xshire Militia". I create an evidence group for the "Xshire Militia" containing just the information from that source. (This is the equivalent of a persona but for a group. And no, I'm not going to call it a groupa).

Then I have a source record for the "Royal Xshire Militia". I create an evidence group for the "Royal Xshire Militia" containing just the information from that source.

Then I have a Source record saying that the "Royal Xshire Militia" is simply the "Xshire Militia", renamed in year Y.

If I am working on the Conclusion only model then I do a destructive merge of those 2 groups.

If I am working on the Evidence & Conclusion model then I ought to apply the E&C principle to everything, so I create a conclusion-group with 2 dated attributes for Name, and point the new conclusion-group back to the 2 evidence-groups.

Now - what about all my relatives who served in the militia? They'll be pointing to either Group1 ("Xshire Militia") or Group2 ("Royal Xshire Militia"). But Group1 and Group2 have now been superseded. I reckon there are 2 possibilities
(1) amend all the references to Group1 and Group2 to read "Group3"
(2) leave them untouched but make the software follow the Group reference up the conclusion tree

Method 1 is a pain because any amendment also needs to create a new conclusion person which just goes on and on and on.

Method 2 makes more sense to me - if the software writes a report about group membership, it comes to "John Doe" and his membership of Group1, finds that Group3 has superseded Group1 (because Group3 points back to Group1), and skips on to Group3 to get the details of the Militia for the report.

I reckon similar principles apply to updating a family, where the family's events and attributes do not change - if person P1 has been superseded in the conclusion tree by person P2, then don't change the family membership to replace P1 by P2, simply allow the software for the Family Reports to navigate first of all to P1, then float up the tree to P2.

I think the other instances I asked about can be solved similarly.

Is this right? Missing something?

ttwetmore 2011-04-17T10:15:59-07:00

Adrian,

Your method two in the Xshire Militia example is definitely the one to use.

The idea of "building a tree" of conclusions should always be to defer to the lower level records when possible. The links are there. Let's use them.

Here is a very simple example.

Let's say I have a birth record that mentions facts about the child and his parents. I extract the info available into the following five records (using GEDCOM for syntax -- could use XML or JSON, but why confuse matters?):

0 @S1@ SOUR
  ... info needed to describe the birth certificate as a source
 
0 @I1@ INDI
  1 NAME Daniel Van Cott /Wetmore/
  1 SEX M
  1 BIRT
    2 DATE 13 November 1791
    2 PLAC New Brunswick, Canada
  1 FAMC @F1@
  1 SOUR @S1@
    2 INFO ... anything the researcher wants to add for the citation ...
  1 NOTE ...anything else the researcher wants to add for any reason at all ...
 
0 @I2@ INDI
  1 NAME John /Wetmore/
  1 SEX M
  1 BIRT
     2 PLAC Rye, Westchester County, New York, United States
  1 OCCU government surveyor
  1 FAMS @F12
  1 SOUR @S1@
    2 INFO ... anything the researcher wants to add for the citation ...
  1 NOTE ...anything else the researcher wants to add for any reason at all ...
 
0 @I3@ INDI
  1 NAME Anna /Van Cott/
  1 SEX F
  1 BIRT
     2 PLAC Oyster Bay, New York, United States
  1 FAMS @F12
  1 SOUR @S1@
    2 INFO ... anything the researcher wants to add for the citation ...
  1 NOTE ...anything else the researcher wants to add for any reason at all ...
 
0 @F1@ FAM
  1 HUSB @I2@
  1 WIFE @I3@
  1 CHIL @I1@
  1 SOUR @S1@
    2 INFO ... anything the researcher wants to add for the citation ...
  1 NOTE ...anything else the researcher wants to add for any reason at all ...

Here we have one source record, three evidence person records, and one evidence family record, all extracted from a single source in the real world.

Later on I discover a death certificate from Brooklyn, New York, leading to the two new records:

0 @S2@ SOUR
  ... info needed to describe a death certificate from Brooklyn, New York
 
0 @I5@ INDI
  1 NAME Daniel C /Wetmore/
  1 SEX M
  1 DEAT
    2 DATE 13 September 1881
    2 AGE 89 years 10 months
    2 PLAC Brooklyn, Kings County, New York, United States
      3 ADDR 75 Saint Marks Avenue
    2 CAUS renal failure
  1 SOUR @S2@
    2 INFO ... anything the researcher wants to add for the citation ...
  1 NOTE ...anything else the researcher wants to add for any reason at all ...

I now have four evidence person records, two for a Daniel Wetmore. Let's make this very simple and decide that these two Daniel Wetmores are the same person. They are in fact are, but I have 40 or more records that follow his entire life from New Brunswick to Brooklyn, so I know this. To just combine what I have shown here would not be such a good thing to do.

Before I join them consider this. One record has his birth info. The other has his death info. There is no overlap between the two. (Let's say his birth place was marked as unknown on his death certificate.) The other thing to note is that his name is different in the two records.

Here is how I would build the conclusion person record for him:

0 INDI @I6@
  1 NAME Daniel Van Cott /Wetmore/  <<-- the name is given in the conclusion person to specify which name from the evidence persons is to be preferred.
    2 INFO ... research not on why this is the preferred of the two names ...
  1 INDI @I1@ <<-- the evidence person taken from the birth record ... birth record is inherited from "below"
  1 INDI @I5@ <<-- the evidence person taken from the death certificate ... death record is inherited from "below"
  1 SOUR
    2 INFO ... conclusion statement describing why this conclusion person was build from the two evidence persons ... pick another tag or make it a shared note if desired.

Note that there is no sex, birth or death information given in the conclusion person. All this is "inherited" directly from the evidence records.

Note that this conclusion person also inherits the family he is in with John Wetmore and Anna Van Cott being his parents.

How much of this does the user of the software have to know about. Very little in fact. The user interface never show him/her this ugly stuff. He/she just sees a nice Daniel Van Cott /Wetmore/ with a nice birth record and a nice death record. He this Daniel Wetmore needs to picked apart in the future there will have to be user interface screeen to show the structure of the person, but that is necessary, and still wouldn't have ugly GEDCOM tags to deal with.

Tom Wetmore

GeneJ 2011-04-16T21:34:33-07:00

Approaches to the E&C; BetterGEDCOM decision

In the Developers Meeting of 11 April 2011, it was proposed that all other progress stop until we have made a decision on E&C part of the model.

See related discussion about how we finalize such an E&C proposal, "What will it take ..."
http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37787542

gthorud 2011-04-17T07:57:20-07:00

I am not sure if this is the correct topic to post this, but I will give it a try.

In the last Developers meeting we got the message that it was an urgent neeed to discuss the E&C model.

The model has been discussed in a huge number of discussions all over the wiki. There is now way to find those discussions.

Also, there is no central place to post issues related to the model from now on. Where do we "register" a topic about E&C to be discussed?

Neither is there any place to record conclusions as such a discussion procedes. We just have a lot of discussions scattered around so you have no idea about which problems have been solved.

Also, it seem like newkommer have a problem finding a propper description of the model.

If we are going to discuss this issue, we need to organize around a page - with a moderator that will accept to haqve all views reflected on the page. If we can't get organized, I will find other topics to work on.

AdrianB38 2011-04-17T13:15:58-07:00

Differences from FS personas?

If some people are reluctant for various reasons to put their trust in the E&C Model with its separate evidence persons and conclusion persons, would it be at all useful to describe how close the persona concept is in the (new?) Family Search trees to that of the evidence person? And what the differences between the 2 schemes are?

If personas turn people off just as much as evidence persons, then it would indeed be a waste of time.

ttwetmore 2011-04-17T13:58:42-07:00

Adrian says:

"If some people are reluctant for various reasons to put their trust in the E&C Model with its separate evidence persons and conclusion persons, would it be at all useful to describe how close the persona concept is in the (new?) Family Search trees to that of the evidence person? And what the differences between the 2 schemes are?

"If personas turn people off just as much as evidence persons, then it would indeed be a waste of time."

Adrian,

The NewFamilySearch approach uses two data types, the persona and the person. In the best of all worlds all persona would be evidence persons, but there is nothing in NewFamilySearch that requires this to be. Basically users enter personas and there are no restrictions on that process. However, it was the clear intention of the NFS that persona records be used for evidence persons, and person records be used for conclusion persons.

If the personas were indeed all evidence persons then NewFamilySearch would implement a full-bodied, two-tier E&C model and process.

So NFS is a good example of E&C in use, but by using two data structures, persona and person, NewFamilySearch locks you into a two-tier process. Basically you do your E&C process on NFS by thinking of each person as a bag of personas, and the E&C process is basically rearranging personas into different bags.

So if it would make certain people feel a little more comfortable about the E&C process by knowing that the LDS is using it themselves in their latest major product, then there you are; they are. However many people have a sour taste in their mouths about systems like NFS, since these systems are open to the public and they fill up quickly with junk. Note that NFS, in order to get more usage, is very forgiving about source information. Whenever you enter a new persona you are able to enter very detailed source information for the persona, but you can be very lazy and not do it. And if you browse through the data you learn that very few people add that information.

The NFS user interface hides the complexity of the two tier system from the users. They never encounter the word persona, or a user interface screen where they are invited to enter a persona. When an NFS user enters a new person he/she enters what seems like a perfectly normal "person" through a simple user interface. What they REALLY do, is create a persona record that gets immediately put into a person bag of size one. So each persona starts off as both an evidence record that is also a singleton within a conclusion record. Later steps can take that persona out of its initial bag and put it in another bag, or take other personas out of other bags and add them to this bag. Sounds a little complicated, but the user interface takes all the complexity out of it -- it simply looks like you are grouping and ungrouping persons together. There is a lot to be said positive about that interface.

My only objection to the NFS approach is the fact that by using two different data types you box yourself into a corner that only allows two-tier E&C models. If the same data type were to be used for both, then there is no limit to the number of tiers one can construct. Remember my example of the Daniel Wetmore's from Yarmouth, Nova Scotia, and the Daniel Wetmores from southeastern Connecticut? This is very naturally a three-tier system. Using NFS I would have to put all the Daniel Wetmore personas from the two places into the same bag. It would be a better model of the research process if I didn't have to do that.

All this being said, anyone who is confused and afraid of the E&C model and process still is. I'm sure the apparent complexity I just explained is not going to make anyone who doesn't like the idea to suddenly change their minds. You don't really see the value in this process until you need to cross the chasm into records-based genealogy. But everyone reaches that point eventually, so time is on our side.

Note that plain old simple conclusion-based genealogy is nothing more than a one-tier system. Thus if Better GEDCOM decides to support E&C at the two-tier or the many-tier levels, conclusion-based genealogy comes for free.

Tom Wetmore

AdrianB38 2011-04-17T14:17:23-07:00

Thanks Tom

I confess in my pre-BG life, I always got turned off by "personas" because I couldn't see how they differed from "persons". And once I read your stuff, it became clear that there is NO difference in their data content. It's just how they're used.

GeneJ 2011-04-21T09:40:39-07:00

See also, _The Ancestry Insider_, "The Evidence Architecture of the New FamilySearch Tree," 21 Jun 2010, http://ancestryinsider.blogspot.com/2010/06/evidence-architecture-of-new.html

testuser42 2011-04-19T14:42:39-07:00

Multi-Level vs Single-Level

Geir is right, we should get to details in the implementation of a multi-level (E&C) Model. Try to find problematic areas and see if there are solutions. I think this page is a good place to do the work - so I'll start the discussion here.

From Geir:
http://bettergedcom.wikispaces.com/message/view/Developers+Meeting/37811606#37822084

Take for example the issue of exchanging data between E/C-models using different number of levels (DeadEnd vs Gentech) and the simple one level Conclusion model used by all? personal programs today. I am not convinced we have solved that issue, and it is not solved by one person.

I thought about how it might be handled and started writing it down - but this stuff is easier drawn then written, so here you go:

Full-res-PDF: multilevel.pdf
This is how I see the two models comparing. I think it's not impossibly hard to morph one into the other via software. And when you allow a Person record to have any number of sources to any of their PFACTs, then both can peacefully coexist.
Or are there hidden pitfalls somewhere?

AdrianB38 2011-04-21T12:41:23-07:00

I said "a pointer to any superseded PFACT on the level below" and TestUser said "That would work fine. Is it better than 'copying up' the preferred value?"

The reason I advocate leaving the stuff down on the lower level (unless there's a clash) is that by leaving it there, we don't introduce any complications when something like a family or an event points to the lower person (etc). E.g. if there's a birth event for one of John Doe's children, then the birth event will refer to John-Doe-Layer1A (as a father) to start with. When we introduce John-Doe-Layer2, then what do we do with the birth event? In the 'leaving it as it as' scenario that I advocate, well, nothing happens to it - leave it pointing to '1A - then the software can climb the chain to '2 if it needs to go from the birth event to the current father.

Altering it to point to '2 creates a minefield of knock-on changes or data loss or 2 birth-fathers....

"if you conclude that a PFACT is wrong, but you have no new value for it ... explicitly adding an empty PFACT?" Err - possibly. But I worry that other software might well use an empty PFACT to indicate that the event took place, just that no details are known. So doing an explicit Cancel seems more robust. (I think the GEDCOM standard does say something about how events that took place, but about which no details are known, should be encoded and it's not an empty tag. I think... BUT - do I believe all developers will read the manual?? <grin>)

testuser42 2011-04-21T14:36:23-07:00

Adrian, I agree completely.

Of course "stuff" should be left in the lower level where it is - UNLESS there is a clash.
Only then is anything like pointing down or copying up necessary.
If there's no clash, the conclusion record can be completely empty except for the reasoning behind the conclusion (I think these are the same because...). All the PFACTs are found by going down the tree.

If PFACTs don't clash, they are all accepted as the current conclusion.
I suggest there may be any number of "Occupations" or "Residences", but also "Births" or "Deaths"!
They don't clash, they just coexist as equally possible - until you decide on a favourite.
When you pick a favourite, you record that in the topmost conclusion record.
Technically, this may be done through a pointer to the favourite, or by repeating the favourite.

Arguments for "pointing":
+ you directly see where the preferred PFACT comes from (only if there was a clash!)
+ this makes it easy to cancel that connection
Arguments for "copying":
+ Humans reading the BG file get the preferred PFACTs spelled out in the topmost person (only when there was a clash!)
+ ?
So now I tend to like pointers more... but it's really not much of a difference.

AdrianB38 2011-04-22T04:40:17-07:00

I hesitate to add another post when we're in agreement(!) but I would say that there is a definite difference between pointers and copying up when we consider resolving between different birth events (say). Copying up gives us 2 children (albeit one a superseded one) pointing at the same multi-person birth event. This sounds a risk to me. I'd far rather have just one child pointing at the same multi-person birth event.

(Ignore twins - that's a different thing - I mean two children when it's a single birth.).

Like I say, we're in agreement but I think you're dangerously near an edge! <grin>

gthorud 2011-04-22T09:09:04-07:00

I have been holding back on discussions since it is nice weather, and we do not have the appropriate pages in place. But since it is Easter, it seems like people have a lot of time to discuss.

I was aware of the problem with links to families/groups – that was discussed not long ago, but wanted to point out the problem with multiple valid occurrences of events of the same type. It has been demonstrated that simply copying up does not solve a conflict for repeatable events.

Adrian wrote: “Any question about the parent John-Doe-Layer1A in Doe-Family-Layer1 can be answered by going to John-Doe-Layer1A, passing up to John-Doe-Layer2, and then down to his constituent parts. (No, the navigation's no easier, but we've not lost the story...)”

Well it is not easy, and neither is it efficient if you have to traverse a 6-7 level tree. For now I note that this is not an ideal solution. You may actually prefer to have more records rather than doing a lot of processing each time you have to display the info. But, I will have to think more about this.

One solution, that we discussed some years ago when we were discussing how to assign unique ideas (ala social security numbers) for all documented persons that have lived in Norway, is to have a clearinghouse for person ids – you have a table with all user ids (EP or CP) and a pointer to a location where you store the id of the current top level CP for that person. The “clearinghouse” can be a purely internal thing to an application, not needed in the BG file (you can build it on import), but we could base our discussions on such a thing being in place. I am sure such clearinghouses are already in use in many applications, the idea is not very “original”. With a clearinghouse it may not matter if you copy (by reference) upwards or not.

Re. the proposal to add a reference to a lower level with a possible cancel option. I wonder why the reference is needed? Is it to record which changes were made when creating the new CP? I am not necessarily disputing it, but want to know exactly why. Could you not simply record a “canceled by a higher level” flag against the canceled event (or rather reference to an event record), and remove that if the EP with that event is later separated from the CP?

If you want a reference, and considering that events may be stored in separate records, the reference should rather point to the event record, and the reference to that from the EP previously recorded could get a “copied” flag so the record is not “counted” twice. The flag would be either copied, canceled or null. (It is difficult to point to a pointer inside a record. But there is a difference introduced, because you lose that trace down the tree, that maybe could be important if you have more than 2 links to the event record.)

Also, assuming there must be a reference, must the user decide how to record the info? It is a complex thing.

Regarding a split window for merging persons. This exists in RootsMagic at the moment, except you cannot choose the facts to include from each person, you have to edit the merged person afterwards. But, considering that you will have to do a merge every time you find something in a new source for a previously recorded person, I don’t think I want to have a user interface that requires me to use the merging window every time I am going to enter new info – very cumbersome. I would rather see an interface where I add info to a CP and the EP is created in the background. I am still asking myself how an efficient user interface would look like.

A remaining issue might be if you for some reason have chosen some “primary” events (note several primary), but want to mention one (or more) of the canceled ones – stating for example that it was earlier assumed to be correct, but is not. This should perhaps be discussed separately.

AdrianB38 2011-04-22T12:06:33-07:00

"You may actually prefer to have more records rather than doing a lot of processing each time you have to display the info" - yes, I would myself, but I have this horrible feeling that if we are to retain the previous individual (in a superseded sense), then we must also retain (untouched) the previous event, previous group / family / previous location, etc. So we might very well with one individual need to alter their matching family which needs to alter the members in that family which... And so on, ad infinitum (or ad nauseum). It might NOT be like depending on which end of the data holds the foreign keys, but then I start thinking that native GEDCOM will be affected differently from an RDBMS because GEDCOM can describe many to many relationships inside entities (think many people in many families, all inside the family entity in GEDCOM). And I start seriously worrying about the difficulties of the _physical_ design, so I'd rather have a slow logical design _if_ it gave rise to more robust physical designs.

Re the clearing house idea - I'm not sure this gets you anywhere since you still don't know who was living in X-land in 1700.

"Could you not simply record a “cancelled by a higher level” flag against the cancelled event" Probably, yes. In fact, I've been trying to deliberately not specify which end of a relation has the pointer / foreign key, so if I gave the impression I thought it should be the other way round _physically_, that was accidental. Given that the superseding will be from one or many on the lower level to one on the higher, then having the pointer physically on the lower level seems the only way, actually.

"Also, assuming there must be a reference, must the user decide how to record the info?" Sorry - not sure what you mean here.

"I don’t think I want to have a user interface that requires me to use the merging window every time I am going to enter new info"

Thinking of the Merge window in Family Historian, me neither. But, maybe that's because the merge window is designed to merge any size of data for any reason. If I think about how an application might work, I think the "merge" steps might be easier if dedicated to a specific task.

Aside......
(One thing I am moving towards - I have previously said that anyone could decide whether to work in Conclusion-only mode, destructively merging new data into the old, or in Evidence & Conclusion mode, where the old is retained untouched ready for roll-back etc. The complexity of combination makes me believe that the user should NOT get a choice - it should work like newFamilySearch where the user has no idea that their input is being stored in separate personas and merged into persons (to use their terms). If software offers E&C, it might as well use it for all stuff given that it should be hidden. Which has all sorts of implications for what is exported...)
Aside ends........

If I think through how data would be entered into an E&C program, then I think it should conform more closely to the spirit of the Genealogy Research Process bubble-chart (rendered by Mark Tucker).

1. Tell the software to add a new Source-record.
2. Add the author, publisher, etc of the Source to the Source-record.
3. Add an unstructured transcript (if you want) of the source or a link to an image of same.
4. Add a structured version of the text of the source, marked up to say that this is a person, whose name is X, whose baptism (say) is, whose parents (say) are... You enter this on a form that looks exactly like a person record (or family or group or location, etc).
5. You press <enter> and as far as you are concerned you've entered a source record with a marked up version of its data. (PLEASE can we not argue whether this is data, information or evidence).
6. Behind the scenes, the app has created a Source record and a Person record that relates to no-one other than the person / people from this Source. In fact, it's a Person that is acting as an "Evidence person".
7. You then get a dialog box asking you which person already in the database, if any, this guy from the Source matches to.
8. You either press the button for none-of-the-above or click the right person to merge with.
Yes, then it becomes trickier!!!
9. If none of the PFACT types on the new E-Person match the ones on the matching one (part form the name), then it just becomes a list of "These are the new values to be added, OK Y/N". If any types match, then it becomes a dialog one after the other - "Add new Occupation Y/N?" "OK not adding but now what?" - but I'm not sure this is any worse than me trying to remember which items to add. Well, maybe it's longer.
10. Don't pretend this is easy. It isn't of course, our job to design the app but that's an excuse - if we design something where the data is too complex to add, we're stuck, so we need to have some idea that it's feasible.

testuser42 2011-04-22T12:36:36-07:00

Adrian, good thoughts about the app. Should we create a page to collect all the ideas about apps that have been coming up? I know we're not designing a program, but there have been good proposals how the power of a BG could be used in future applications.
About the aside, I think you're right. No user will care how his application stores things on export (much less how it works internally), as long as the data is faithfully transported to other software. So any app exporting to BG should use the non-destructive multi-level approach. If there's a need to convert a BG to GEDCOM, there would be tools for that, that will do the "compressing" of the Levels, as best as it can be done.

testuser42 2011-04-22T12:43:56-07:00

Geir's "clearing house" idea --
Do you mean after first parsing of the BG, a table is created internally in the application that stores the UUIDs of Records that are connected?
That seems like a good routine for a software, could speed up searching. But as you say, it's an internal thing of the software's database.
The relevance to BG may be that we don't have to be afraid of big trees of Records - there will be ways to follow them...
(But I'd really like to hear Louis or Tom about that - they've the experience)

testuser42 2011-04-22T12:54:50-07:00

Adrian said:

...difference between pointers and copying up when we consider resolving between different birth events (say). Copying up gives us 2 children (albeit one a superseded one) pointing at the same multi-person birth event. This sounds a risk to me. I'd far rather have just one child pointing at the same multi-person birth event.

Yeah, that's a valid concern.
I'm already in favour of pointers, but just to play devil's advocate:
If there was a rule that says "identical things are only counted as one" -- that would solve that, wouldn't it? If you just copy up the prefered "thing" without changing anything, then its a duplicate, and counts as one. If it's changed, it's a new thing anyway.

AdrianB38 2011-04-23T12:30:12-07:00

"If there was a rule that says "identical things are only counted as one" -- that would solve that, wouldn't it?"

Not sure - I don't think it's the numbers that worry me so much as the navigation. If you got to that birth event (from a parent, say), and you wanted to go on to the child - what happens? You've got a choice. Either you treat them as twins and eventually it falls out that they're the same top-level person, or you're in real trouble.

Twins makes sense, so that's possible - but what if the role of the person you're trying to get to is the birth mother? Two birth-mothers? Sounds guaranteed to trip something up...

gthorud 2011-04-23T17:19:48-07:00

I have still not got an answer as to exactly why we need the link to the lower/higher level event. What will we use it for? One possibility is to see why an event has been canceled (the reason being recorded at some higher level) so you want to point upwards. The other might be to see the basis for an event that is the result of merging two events (see below), but I am not sure, it might get complicated if you are always going to link to the basis for all sorts of conclusions (other than pointing to the next lower level persons). In any case, downward links will have to link to the event records, with a flag attached to the link, as I have described above. Two way links are usually an application thing only. Other reasons for these upward/downward links?
---
Adrian wrote: “One thing I am moving towards - I have previously said that anyone could decide whether to work in Conclusion-only mode …. “

Considering that you are very likely to import “Conclusion only data” the user will have to experience some inconsistence, e.g. can’t always do a rollback on that data. Also, 98% of programs are conclusion mode only, and they will want to keep that as one option (few will risk to enforce E&C unless you can hide it). It is therefore a question if they have to add an E&C mode, which may add restrictions on how to work (particularly one source per E-person), or if it is possible to operate as before in conclusion mode as seen by the user – and the program will store the modified data as in E&C mode? I am not sure the latter is possible.

I think it might be useful, as testuser suggest, to collect some UI/application considerations, possibly as an annex to a specification – that can be thrown away later if so decided.

Adrian wrote “9. If none of the PFACT types on the new E-Person match the ones on the matching one (part form the name), then it just becomes a list of "These are the new values to be added, OK Y/N". If any types match, then it becomes a dialog one after the other - "Add new Occupation Y/N?" "OK not adding but now what?" - but I'm not sure this is any worse than me trying to remember which items to add. Well, maybe it's longer.”

This is the interesting step. The only option is not just to add an event, more generally you could also cancel another one or copy one of the two upwards, and then modify it (possibly reentering some info), or rather than copy+modify use a merge on the two evidence events to create a conclusion event, and then cancel both the “evidence events”. It could be a large number of questions asked by the program, and there can be many persons to link to (so the you can’t just check if you have the event type before, also have to consider the role) , and many events form one source . The question is if the program MUST ask these questions, or if you can let the user operate on in his own, and yet the program will be able to store the data correctly.

Testuser, yes, the clearinghouse is created on import and maintained as updates are made.

AdrianB38 2011-04-24T04:45:18-07:00

"why we need the link to the lower/higher level event. What will we use it for?"

Default assumption if no link is that the details of this PFACT or relationship or whatever, carry upward into the next level of this person. The link is there to say "No - something happens".

That something might be total suppression (i.e. cancel an emigration because they didn't go) or replacement by a value held on the conclusion level above.

In the physical implementation I don't think we want downward links because they're at the wrong end of a many-to-one.

"the user will have to experience some inconsistency, e.g. can’t always do a roll-back on that data". Indeed. The design has to cope with having a mix of both E&C and C-only data on the one file for exactly this reason. Going back to peel my conclusions apart is not something I would ever remotely contemplate.

"restrictions on how to work (particularly one source per E-person)" - yes, I'm not sure of the benefit of this rule. Probably it's how the wizard works for entering new data, but is it even worth freezing in a rule? Particularly when you consider how you start a database in E&C mode - what will you do first? - enter your own data, of course. Will you enter a source first? No, you've never heard of them. Logically the wizard then asks you "How do you know you exist?" (Err... I think therefore....) So it might be sensible to automatically create a source for "My personal knowledge" - this is Phase 1 of Ancestry Insider's life-cycle. So, in retrospect the rule wouldn't be a problem.

"few will risk to enforce E&C unless you can hide it" This is what I am beginning to think. In fact, I'd go further - the potential complexity necessary to permit roll-back and separation is such that unless you have a brain the size of a planet, automatic processing is the only way to do it.

Which raises all sorts of issues about how soon we could ever expect to see E&C software from the big companies who aren't genealogists. (See the AI's comments on Product Managers that lead into the Chasm posting).

"This is the interesting step" - yes, as in that Chinese curse "May you live in interesting times!"

"The question is if the program MUST ask these questions, or if you can let the user operate on in his own, and yet the program will be able to store the data correctly." I am tending, as I suggest above, to get the program to hold the users hand wherever it can. An exact answer would need to step through precise cases.

gthorud 2011-04-25T13:09:24-07:00

I discovered that one thing was left out in my last post:

Testuser wrote: “identical things are only counted as one”

You don’t want to figure out if any data are the same every time you access them, so I don't think that is the way to go.

gthorud 2011-04-19T15:47:26-07:00

Maybe we should start a new discussion without the image inline?

testuser42 2011-04-19T16:09:33-07:00

Hi Geir, I changed the textcolor to 100% black.
I hope by deleting the old versions of the image, that the big one disappears...
(for me until then, Opera's "Fit to width" makes these pages readable - have you tried that famous Norwegian product :)
If nothing helps, start a new thread.

gthorud 2011-04-19T17:25:52-07:00

Thanks for the tip. The first version of that famous product was developed about 5 meters from my office, in the same department as I worked, 16-17 years ago.

gthorud 2011-04-19T17:43:28-07:00

Prints fine now. Thanks.

Re. the diagram.
When creating P3-1, what if apples (persons) can have several collours? I.e. several events of the same type as for example Residence? Do you have to copy all valid events of a "repeatable event" type upwards? How does a program know which event types are repeatable? (And what if the event is user defined?) Have we discussed this?

testuser42 2011-04-20T05:21:50-07:00

New version of the graphics up, now including a 2-tier-Model.

testuser42 2011-04-20T05:28:28-07:00

Geir says:

When creating P3-1, what if apples (persons) can have several collours? I.e. several events of the same type as for example Residence? Do you have to copy all valid events of a "repeatable event" type upwards? How does a program know which event types are repeatable? (And what if the event is user defined?) Have we discussed this?

Good question! I can't recall a discussion of this.

One thing I imagine is this: in the upper-level persons, you can edit the PFACTs any way you like. So writing "C: red and green" should be fine.

Maybe if there is NO preferred PFACT, then all the lower-level PFACTs are treated equal?
Only if you want, you pick a preferred one.
Would that be a way to solve this?

"Residence" would lend itself to be such a PFACT, and it should be sorted in any output by date (if there are dates).

gthorud 2011-04-20T06:58:21-07:00

I may be wrong, but I see no other option but to REQUIRE that all valid conclusion colours are copied upwards, and I am not sure that is in line with what Tom proposed - but I may be wrong.

And since you can not (always) know how many repeats an event type can have, you may end up copying a lot of (all?) events upward.

testuser42 2011-04-21T04:18:27-07:00

Geir: I may be wrong, but I see no other option but to REQUIRE that all valid conclusion colours are copied upwards

Why?
If they are all equally valid, they can stay where they are and just be "found" by going down the ladder.
If one is more equal than the others, this can shown by either copying up, or by directly pointing from the top level to the preferred color.

I think it doesn't matter how many repeats an event type can have. If it is repeated in our data, then it just is. The data structure doesn't need to know if it's an alternate date or an additional residence. It's up to the user and the software to make sure that PFACTs that are preferred are indicated as such. It should actually be allowed to have more than one date and not indicate the preferred one -- if it's not clear which date is better, we need to keep all options equally visible.

AdrianB38 2011-04-21T04:43:53-07:00

I seriously do not like the copying of stuff upwards to the (current) conclusion level on the basis that it gets phenomenally expensive on records and therefore seems a hostage to fortune when navigating the model.

An instance of a navigation problem is:
- I have John-Doe-Layer1A
- John-Doe-Layer1A is a parent member of family Doe-Family-Layer1.
- I also have John-Doe-Layer1B

I then determine that John-Doe-Layer1A and John-Doe-Layer1B are the same person, so my software creates John-Doe-Layer2.

Question - what happens to John's relationship to the family? If we copy everything "up" to the next layer, then John-Doe-Layer2 is recorded as a member of Doe-Family-Layer1. Thus Doe-Family-Layer1 now has 2 parents named John Doe, viz: John-Doe-Layer1A and John-Doe-Layer2.

This scares me in two ways: firstly having multiple members (e.g. multiple fathers) in the same family. Secondly because this breaks the whole point of retaining unaltered the full chain of data that E&C Model is supposed to do. Specifically in this case because I now have Doe-Family-Layer1 containing both John-Doe-Layer1A and John-Doe-Layer2 and I've lost the earlier conclusion that Doe-Family-Layer1 contained just John-Doe-Layer1A.

The obvious thing would be to create Doe-Family-Layer2 but how long will this go on?

So I'd suggest we don't copy up... In this case we have in the database, after the creation of John-Doe-Layer2:
- John-Doe-Layer1A
- John-Doe-Layer1A is a parent member of family Doe-Family-Layer1.
- John-Doe-Layer1B
- John-Doe-Layer2 with pointers that say this record replaced John-Doe-Layer1A and '1B.

Any question about which family John-Doe-Layer2 belongs to can be answered by looking to John-Doe-Layer1A and '1B.

Any question about the parent John-Doe-Layer1A in Doe-Family-Layer1 can be answered by going to John-Doe-Layer1A, passing up to John-Doe-Layer2, and then down to his constituent parts. (No, the navigation's no easier, but we've not lost the story...)

OK - so, how do we deal with the problem of, say, John-Doe-Layer1A, John-Doe-Layer1B all having different occupations, and John-Doe-Layer2 having just a subset because the merge said "OK - we acknowledge that occupation is plain wrong - they confused it with his father's on the source document"

I think we can do it with these rules:
- any occupation explicitly linked to John-Doe-Layer2 is "true";
- any occupation explicitly linked to John-Doe-Layer1A or '1B is "true" for John-Doe-Layer2 as well UNLESS it's linked to an occupation on John-Doe-Layer2 that either gives a different value or "cancels" it out.

This needs 2 new items on the higher PFACT -
- a pointer to any superseded PFACT on the level below
- a marker that says "Cancel" the lower level PFACT. If Cancel marker is not present, then it means this level supersedes the lower level PFACT.

AdrianB38 2011-04-21T04:46:58-07:00

I can imagine this in an application - 2 screens that line up when you want to logically "merge" the 2 John Does. Mark up which PFACTs go forward to the merged guy, and which you want to suppress.

In the application it looks like you ARE physically merging the 2 guys - behind the scenes, you're NOT, so you can go backwards to see previous history for anyone.

testuser42 2011-04-21T05:59:56-07:00

Adrian wrote:

So I'd suggest we don't copy up... In this case we have in the database, after the creation of John-Doe-Layer2:

- John-Doe-Layer1A
- John-Doe-Layer1A is a parent member of family Doe-Family-Layer1.
- John-Doe-Layer1B
- John-Doe-Layer2 with pointers that say this record replaced John-Doe-Layer1A and '1B.

testuser42 2011-04-21T06:04:28-07:00

Adrian said:
I can imagine this in an application - 2 screens that line up when you want to logically "merge" the 2 John Does. Mark up which PFACTs go forward to the merged guy, and which you want to suppress.
In the application it looks like you ARE physically merging the 2 guys - behind the scenes, you're NOT, so you can go backwards to see previous history for anyone.
Yes, that's about how this should work.
And if you look at John Doe, you'll always see the latest conclusions all together. His events and relationships will be there, and you don't have to know how they are found or linked. All the alternatives for any PFACT can be seen on demand, but usually you'll just get the currently preferred PFACT.

testuser42 2011-04-19T14:56:39-07:00

The image is too big. I've overwritten the image with a smaller version (1000px width instead of 1200) but it's not changing. Is the big image cached somewhere?
And the link to the PDF should be

multilevel.pdf

Hope that works...

testuser42 2011-04-19T15:08:45-07:00

oops, mistake in the one-level person P1:
The line from "B:tasty" should go to S1.
S2 only has a link to "A:apple".
The other PFACT ("C:red") of that source is lost.
I've re-uploaded both PDF and image.
I see there are "revisions" for the files - thats where the old ones are kept. I can't delete the old versions. Could an admin do that? Maybe this will work.

gthorud 2011-04-19T15:45:41-07:00

Printed the PDF on a b&w laser printer. Most of the text was in light gray, barely readable.

Also, here we go again with long lines because of the image. Nothing to do with it, posted is posted.

mmartineau 2011-04-19T22:58:31-07:00

PROPOSAL: BetterGedcom E&C; Wiki Organization, Rules and Guidelines

I've been asked to moderate the BetterGedcom E&C specification. It is apparent from reading many of the discussions that the users on this wiki have a broad range of experience and not everyone agrees on how to build a genealogy data exchange specification. I believe our collective experience, ideas, creativity and differing opinions can be channeled to produce the best possible specification. To enable more people to contribute in meaningful and effective ways I propose the following rules, structure and guidelines for contributing to BetterGedcom E&C model:

BetterGedcom E&C Specification Wiki Pages Organization

The pages in the BetterGedcom E&C part of the Wiki will be organized in the following way to help facilitate the creation of the specification:

Main Page
This page will contain summary information about the specification, the rules and guidelines on how to contribute to the specification (essentially this post) and an index and summary of specification proposals (see below). It will have links to the following sub-pages:

Official BetterGedcom E&C Specification
This page will contain the official specification and links to sub-pages of the specification.
Specification proposal discussions (see below) will happen here.

Prioritization of Requirements
To help the BetterGedcom community stay focused, requirement priorities will be debated here. (This actually might fit better if it were incorporated into the Better GEDCOM Requirements Catalog part of the Wiki)

Specification Brainstorm
This page will contain links to pages created by anyone who wants to brainstorm an idea and get feedback from the BetterGedcom community. The general outcome of a brainstorm will hopefully produce an official proposal to solve an existing requirement or create new requirements.

Archived Stuff
This page contains links/content no longer included in the current specification, but still valuable as a reference, etc.

Official Specification Discussions

There will be 2 types of discussions on the Official BetterGedcom E&C Specification page and its sub-pages. Proposals and questions/comments. Proposals will be focused discussions that debate specific solutions for inclusion in the BetterGedcom specification. Each proposal discussion will focus on a solution to a requirement. All other discussions will be normal question/comment type discussions regarding the current official specification.

Proposals

A proposal discussion presents a solution to a requirement as defined in the Better GEDCOM Requirements Catalog.
The subject of a proposal discussion must start with "PROPOSAL:" so that users can quickly determine which discussions directly influence the standard.
A link to the requirement the proposal addresses should be included.
Only requirements previously approved by BetterGedcom wiki members can be candidates for a proposed solution. Debate on whether a requirement should be included in BetterGedcom should be debated on the Better GEDCOM Requirements Catalog discussion board.
Debate within the discussion must stay focused on the requirement and the solution. When an opposing argument is made against the proposed solution, it is preferred that a counter solution is presented that corrects the debated weakness of the proposed solution.
The result of the proposal after adequate debate is either adoption of the solution or a counter solution into the official specification, or postponement due to issues uncovered during the debate (such as some other requirement must be solved first to adequately solve this requirement).
The ultimate solution must be able to accurately represent all data defined in simple, moderate and complex case studies (see below) and satisfy the linked requirement.
If, after adequate debate, two or more solutions can pass the requirement and case study test and a consensus has not been reached, a vote will be taken to determine which solution will be adopted by BetterGedcom.
Once adopted, the solution will be incorporated into the official standard on the main specification page or sub-pages.
If the solution is later found to not meet requirements or accurately represent the case studies for whatever reason, a new solution can be proposed.
When debating a solution, courtesy for others should be a top priority.

Case Studies

In order to demonstrate the practical effectiveness of the BetterGedcom specification and to help avoid "my way is better just because" arguments, a series of case studies will be created ranging in complexity from simple to very complex. These case studies will demonstrate various research methodologies such as Evidence Explained and GPS as well as others. They will be maintained elsewhere on the wiki by those with experience in this area. The BetterGedcom specification must be able to accurately represent the data in the case studies. They will also provide the basis for example BetterGedcom files once the specification is complete.

GeneJ 2011-04-21T00:12:06-07:00

This wanders a bit. Tired right now, but want to get it posted.

(1) Might we have a description on the front page of what this "persona"/E&C is now, especially with _your_ take, Mike.
It started out as a model built to support a process. For a while, it was a model in search of a process. If it's really a cool but complex data model developed to support equally cool and complex algorithms supporting _general_ genealogical information (and we are just sure somewhere in there, genealogical requirements), that's fine, I'd like that to be more transparent.

Not all algorithms are equal (ala, sub-prime), so maybe there needs to be some context given to "algorithms."

(2) might we work toward a shorter term "persona"/E&C proposal with some extended options, especially those that might maximize the advantages of the "research log" and EE & GPS Support areas. See 3.

(3) Possible part of the proposal could show how the model holds up for specific, isolated circumstance. Even isolated complex circumstance.

Up until a little while ago, I thought we were going to be able to evaluate the model "within a working prototype"; then I understood a representative project would be available, and also documentation.

Without the prototype, rather than a case study, the kind of work Geir and TestUser were doing most recently have been helpful to me.

A case study is easier said than done in the short term. If it's helpful for now see the BetterGEDCOM blog (What is research ...)--those were done for the project, so folks would have real world circumstance; they just were just never used. Ditto, the several evidence postings with commentary on my personal blog ("They Came Before").

Off the cuff, I can think of two of those that might be interesting, from the beginning researcher:
(a) Ancestor's obituary ... Left 11 children to morn. If the children aren't named, how is the evidence entered--are you adding 11 surviving personas?? When you scour the vital records, if you only find 11 vital records, what happens to that first entry? When you then find a 12th record--do you create a persona for one child then deceased (or do you just divorce the family at this point)?
(b) In the body of evidence blog entry (BetterGEDCOM), there is a humorous take on the Native American Princess. While it was intended to make you chuckle, the evidence circumstance is still valid--how likely is it a beginning researcher recognizes _all_ the personas possible from any one item of evidence.

Without the prototypes, I'm not sure I learn what is stored "under the hood" and how that would be accessed. Ditto, how data is purged and, since I assume there will be one, what happens when someone hits the "over ride" button. How do these roll backs work after an extensive and complex roll up? What if there was an override?

Where are we on negative evidence?

Is it unrealistic to not only document how the "persona" model handles these various circumstances, but how it might alternatively be done WITH the Admin/research functions in place.

(4) Given whatever description we have then, aside from all the reasons we've been told developers want this, might the proposal provide a rationale for why genealogists should also want the described "persona"/E&C type structure. It would be nice to hear it without too much hype and without features not model dependent.

As some of you know, I toured a chapter of APG through the blog and wiki last week. The speech was originally going to ask some of these folks to help with the citation elements and other aspects in the EE & GPS Section. The speech was restructured after that Monday's Developers' Meeting. During the speech, we toured the BetterGEDCOM sites--getting all up close and personal with the *many Wiki descriptions* of me and my genealogical insight. Some of those on the tour have been following the postings put together for BetterGEDCOM on it's blog and my personal blog. We had a long visit in EE & GPS Support and then reviewed the discussion that might dismantle that effort.

As you all probably know, APG members sign a Code of Ethics. http://www.apgen.org/ethics/index.html

The tour group was made up of genealogical speakers, teachers, authors--almost all are bloggers. They found it very interesting.

As a part of understanding why genealogists should want the persona, the folks who worked on GenTech were really sharp. When the GenTech was reviewed from the user standpoint, that group concluded it was a pretty cool model, but "It appears to be too complex for the beginner, and perhaps too tedious for the experienced researcher who can easily extract the pertinent information from a document, ignoring the boilerplate, and leap directly to a final conclusion."

When BetterGEDCOM looks at the rationale for why genealogists should want "personas," hopefully that includes identifying how handling evidence or technology has changed to make us think the process is less complex for the beginner or less tedious for the experienced researcher.

testuser42 2011-04-21T02:56:53-07:00

Case studies may be well placed in the test suite page:
http://bettergedcom.wikispaces.com/BetterGEDCOM+test+suite
http://bettergedcom.wikispaces.com/message/list/BetterGEDCOM%20test%20suite

I'll open a new discussion for Gene's example(s).

testuser42 2011-04-21T07:46:13-07:00

Discussion about GenTech's model should really be there: http://bettergedcom.wikispaces.com/message/list/GenTech+Data+Model
But anyway, as an aside:
GenTech has a 2-tier model (Personas and Persons). That's good.
Their implementation of a 2-tier-model is not the only possible way.

There are other good ideas in the GenTech model (Administration submodel might be one, though I've not looked too deeply at it yet).

BUT they have unnecessarily complicated the connections between the Records way too much. E.g., everything needs an intermediary "Assertion" Record, instead of simple links between other Records. That is not very elegant.

Overall, the ideas are good, but the GenTech implementation feels more "academical" than "practical". That doesn't mean it couldn't work. Only that it may be to complicated to implement, at least for smaller developers with less manpower.
Tom wrote a good review on the page mentioned above.

hrworth 2011-04-21T07:57:24-07:00

testuser42,

It is possible to post the Definition of Personas on this page.

http://bettergedcom.wikispaces.com/Glossary+Of+Terms

Thank you,

Russ

gthorud 2011-04-21T09:22:56-07:00

Where do we put the discussion on “Do users need this”, “Do online services need this”, “Is it too complex for users”, “Is it to tedious” – I think the overall high level discussion on that should be kept separate from the discussion of the inner workings of the model, but as part of the E&C pages. You could say this goes in requirements, but I feel there is a higher level discussion going on, and there are some “negative requirements” that are not really requirements to the model.

Re. test cases – meaning something someone could use to test conformance to the standard - that is something you start on when you know you will have a standard and have more or less finished the standard. It is most likely not the same thing as test cases for the E&C work which I feel would benefit more from documentation (reports) of real cases and evidence, that could be discussed as such and possible also PARTLY be converted into data to demonstrate a specific issue, but should ignore the possibility of the data being used for conformance testing. I think it would be useful to at least have a list of links to E&C relevant test cases somewhere on the E&C pages.

I think an evaluation of the E&C aspects of Gentech should take place on the E&C pages, and it may reference any such discussions on the Gentech model page, so we have everything in one place. What we will have to do is compare models – there is already a page for comparisons – but as stated we should keep as much as possible in one place.

Also, we should start to enforce the rule at the bottom of this page http://bettergedcom.wikispaces.com/Guidelines+for+posting+or+editing I am tired of reading several pages just to find one paragraph that is relevant, I tend to ignore such postings.

GeneJ 2011-04-21T09:31:44-07:00

@Russ:

You wrote, post the Definition of Personas on this page.

As you point out, we don't have a BetterGEDCOM definition for persona.

It's a term used in GenTech and apparently also used in newFS.
See: http://www.ngsgenealogy.org/cs/GenTech_Projects
for the 101 page data model, pg 60 has a description, but if you search the document for the term, you'll find many, many references in context.

See: Adrians, "Differences from FS personas?"
http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37800364
And also: The Evidence Architecture of the New FamilySearch Tree at
http://ancestryinsider.blogspot.com/2010/06/evidence-architecture-of-new.html

I've started to use that term instead of Evidence Person.

hrworth 2011-04-21T11:01:10-07:00

GeneJ,

I think, like all other definitions, it belongs on the Definition Page. When looking at that page, I don't think we need to have to click to a number of links.

Just a suggestion.

Thank you,

Russ

AdrianB38 2011-04-21T11:59:09-07:00

I have just added a proposed defintion to the discussion tab of Glossary Of Terms.

See http://bettergedcom.wikispaces.com/message/view/Glossary+Of+Terms/37998504

Any further evidence on nFS use will be grateful - please add to that discussion thread, not this. So far as I know, their Data Model has not been published externally - the FS Developer Network indicates how to use the nFS APIs and does not document the Data Model per se.

AdrianB38 2011-04-21T12:02:33-07:00

"I've started to use that term instead of Evidence Person"

Gene - I started to do that for various reasons that have nothing to do with IT and everything to do with psychology. However, having now checked the nFS usage and the GENTECH usage, they use the terms in WHOLLY different ways. So it's not a good idea.

AdrianB38 2011-04-21T13:02:07-07:00

Re "GenTech has a 2-tier model (Personas and Persons). That's good. Their implementation of a 2-tier-model is not the only possible way"

That's not the way I read the GenTech documentation - it doesn't have a Person entity and, so far as I know, can be any number of levels.

nFS has a 2-tier model (Personas and Persons).

testuser42 2011-04-21T13:51:20-07:00

Oops, you're right, Adrian. I confused GenTech and newFamilySearch here.
Just checked the GenTech Model PDF again - oh my... it is complicated!

I understood both you and Gene were using "Persona" = "Evidence Person", and so was I.

gthorud 2011-04-21T15:56:11-07:00

Please take the Persona discussion somewhere else, it has nothing to do in this topic.

gthorud 2011-04-20T17:01:25-07:00

I think the overall structure looks good (but see second last paragraph below).

If I read this correctly, we will start with requirements, then brainstorm ideas, then propose solutions, then enter the result in a draft.

The page structure should initially have a link on the left side of the wiki, similar to “EE & GPS”, and be linked to from the Rec Cat.

Re. 1.1 Specification
I assume versioned drafts will go here. Although this specification will be a separate document, it will rely on other parts of BG, so I don’t expect it to be used alone. I assume the specification will not simply specify a data structure, but also definitions and rules for use of the structure (possibly including examples).

It is also a question about where we draw a border between the E&C and other parts of BG, and how do we “interface” with them. The problem is that things are intertwined. I think we have a relatively good understanding of sources/citation structures (at least the info in them), but we will depend on some progress in the research method/administration area.

Re. 1.2 Requirements
This page should start out by identifying requirements, we will see later if prioritization is needed. I have no strong feelings about where the requirements should go, on a separate page or the Catalog. The BG Requirements Catalog is a heavy document, with a long list of discussion topics. It might be simpler to work with a separate page, using the same (or simpler) table structure, and move it to the Rec Cat when it is mature.

Re. proposals.
Even if there are opposing arguments (problems) it may not always be possible, or easy, to propose an alternative – so presenting an alternative should not be a requirement for an alternative solution.
The BG project has tried voting before, it did not work, and with the current participation it is my opinion that it is unlikely to work – we will have to have more contributors onboard for it to work, and we also have a lot of people on board that are unlikely to understand complex technical things. As I see it, we will either have to work towards consensus or at take a break after documenting the alternatives, or agree that there will be alternatives.
Discussion topics should have precise titles, not some of the “nonsense titles” that can be observed on the wiki.

Case studies
I don’t know who will be responsible for the case studies. I am not sure we will see huge case studies at the moment, most of them will be made up to nonpoint a particular issue in the discussions. I guess case studies will be useful in at least the Administration and Source/Citation areas. But those working on E&C may have to do the cases necessary for E&C.

One type of case studies I think might be useful is various examples of documentation methods – the output – or sketches of such. Also there could be various “data sets”, or “how do we handle this source type” (e.g. censuses).

I have posted several issues here http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37787542?o=40#37908106
These issues will have to be sorted into the various E&C pages. The issues I am not sure where to place in the proposed page structure is the benefit/drawback comparison between various models (the existing models will not go way any time soon and our model is likely to be challenged by alternatives, so we have to be prepared), and E&C for other things than persons.

I think we should spend some time digging up old discussions, and describe the topics in them. Could go in the archive. Simply having links is not very useful if you want to find a discussion where you know something has already been discussed. We may not want to spend too much time repeating the old discussions.

gthorud 2011-04-20T17:06:58-07:00

The system tells me that 2 replies (now probably increased to 3 by this posting) have been posted to Mikes initial posting, anything missing?

testuser42 2011-04-21T06:19:05-07:00

Negative Evidence in an E&C; model

prev. discussion: http://bettergedcom.wikispaces.com/message/view/GOALS/30536663
also parts of: http://bettergedcom.wikispaces.com/message/view/Evidence+and+Conclusion+Process/35338758

testuser42 2011-04-21T06:28:14-07:00

Gene wrote in http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37924068#37977562

(b) In the body of evidence blog entry (BetterGEDCOM) [see http://bettergedcom.blogspot.com/2010/12/what-is-research-having-fun-with-body.html], there is a humorous take on the Native American Princess. While it was intended to make you chuckle, the evidence circumstance is still valid--how likely is it a beginning researcher recognizes _all_ the personas possible from any one item of evidence.

Lets try and see about that.
I think that usually it's not too hard to see how many "Personas" there are in one piece of evidence. If you find a source that mentions a Native American Princess, then you can make a Persona for her, if you believe she might be relevant.
In the case that a source is so confusing that you miss a few people that should get their own Persona, you can always just make them later and then link them to that source, maybe including a note that it wasn't easy to see these as seperate or relevant earlier.

testuser42 2011-04-21T06:29:44-07:00

Actually, the above hasn't much to do with negative evidence - should be somewhere else?

testuser42 2011-04-21T06:39:03-07:00

Now, how can Negative Evidence be recorded?
Gene posted this: http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37924068#37977562
I think she already has it: "Research Notes". These would be a type of Note on the Level0, they could include pointers to the Sources that have been searched, and pointers to the Person Records that are concerned. That's it. It is independent of the way the Model for Persons works, if it has 1, 2, multi-levels.

AdrianB38 2011-04-21T12:46:33-07:00

So far as I can see, "Research Notes" is the only sane way of doing it. (There are insane ways involving Boolean algebra but ... hey, wake up there!)

It's worth stating that the question "how can Negative Evidence be recorded?" applies across the board - it applies in a Conclusion-only model just as much as an E&C.

GeneJ 2011-04-21T13:43:41-07:00

@Adrian wrote, "the question "how can Negative Evidence be recorded?" applies across the board - it applies in a Conclusion-only model just as much as an E&C."

Huh??? I record negative evidence all the time.

Greg said something like that way when, but I think we put that to rest.

Here's where Greg brought it up:
http://bettergedcom.wikispaces.com/message/view/GOALS/30536663#30555817

See my response:
http://bettergedcom.wikispaces.com/message/view/GOALS/30536663#30555817

See the examples, "Negative Evidence":
http://bettergedcom.wikispaces.com/About+Citations

testuser42 2011-04-21T14:09:21-07:00

Gene, yes, you're doing it already. You write of a more general approach to record "negative search results" which uses a custom event called "research note-no evidence".
That's the way to do it. That's what Adrian and I are agreeing. That also has nothing to do with E&C. It has all to do with establishing a standard "Research Note" structure, so that you can export and import these. Nowadays, your custom event might export but there's no telling if or how it will import in other software.

AdrianB38 2011-04-21T14:41:57-07:00

Gene - I'm agreeing with you! You write your negative evidence in some sort of research note, which might be a citation, proof summary, proof argument, etc, etc. I imagine virtually everyone does something similar.

If we go for more of these entities that are referred to in my Research Process (and have previously been referred to as Admin stuff), then the richer choice _may_ make the question of where to put the negative evidence slightly more complex. But probably it's still pretty obvious - the search log will contain some list of what was to be searched, with what selection criteria. The negative evidence is simply the statement "Nothing Found" in the post-search version of the search log. Then you use that "nothing found" statement in the proof argument / summary / whatever. No dramatic change, it just pops up in 2 places.

The only reason for asking the question is to record what to you, I and TestUser is the obvious answer (more or less) and to see if there are any minor differences in our methods. And because I'm a pedantic so-and-so who just likes it clearly written down!!!

GeneJ 2011-04-21T16:00:10-07:00

@Adrian,

A negative search is different than negative evidence. A negative search might be the result of a process, while negative evidence is the result of analysis. See below, "an inference one draw..."

... definition of negative evidence in Elizabeth Shown Mills, _Evidence Explained_, 1st ed., electronic, 2000, pg. 826 (and 25), "negative evidence: an inference one can draw from the absence of information that should exist under given circumstances."

http://bettergedcom.wikispaces.com/message/view/GOALS/30536663#30555817

:( Information is not evidence.

GeneJ 2011-04-21T16:16:52-07:00

pssst ... should have said,

Information in and of itself is not evidence. In DeadEnds, all information is evidence.

Tom and I hashed through the difference with that horrid and overly simplified city directory example.

testuser42 2011-04-21T16:41:27-07:00

A negative search might be the result of a process, while negative evidence is the result of analysis
OK, but do we really need to pick that nit? Can we record any of these anywhere else than in a Research Note of some kind?

Information in and of itself is not evidence.
Yes, "evidence is information that is relevant to a problem." (EE, from the Definitions Page)
In DeadEnds, all information is evidence.
I don't think so. You wouldn't record anything if you didn't think it's relevant to your problem!
Also from the Definitions: "Evidence (dictionary) ... information drawn from a document". This is what a Level 1 Record (aka Evidence Person or Evidence Event) is. It contains information drawn from a source, and it'll only be created if you thought that it was relevant.

GeneJ 2011-04-21T20:55:14-07:00

(1) Humm... Nit pick? For me, figuring out the various evidence is a skill. See discussions about the Research Process Map, Tom Jones Inferential Genealogy.
(2) ?Research note .. maybe "reference note." In a working file, the record of _all_ my evidence is in my reference notes.
(3) "..if you didn't think it's relevant to your problem!"
http://bettergedcom.wikispaces.com/message/view/Evidence+and+Conclusion+Process/35338758#35655992

Also, Adrian's page.

Hope this helps. --GJ

AdrianB38 2011-04-22T04:28:13-07:00

Guys - we are getting entirely too caught up in precise definitions of words. Yes, Gene, reading your statements on "A negative search might be the result of a process, while negative evidence is the result of analysis", I can see that makes perfect sense - BUT...

We started out with the question "how can Negative Evidence be recorded?". TestUser said "I think [Gene] already has it: "Research Notes""

If you want to clarify that it wouldn't be research notes as such but reference notes, then fine - but there's a whole lot of other stuff on this page that doesn't help us get to that. Yes, we understand that figuring out the evidence is a skill - totally agree with you, don't need cross references for it <grin>

As you yourself have pointed out, different people work in different ways and so long as we have something, somewhere, we're OK. If you want to put all your evidence into your reference notes - by which I presume you mean the bibliography and / or footnotes - then fine but I happen to want to do things differently and BG should allow us to both work how we want. I'd like to create a series of new entities related to the Research process and use them - but that's me.

At the end of the proverbial day, between bibliography and / or footnotes on the one hand and some yet to be defined Research entities, we have a home for negative evidence. And / or negative search results.

testuser42 2011-04-21T06:54:36-07:00

Rolling back and removing conclusions

It's been asked how to undo conclusions, and how to go back to a previous conclusion.
Here are some ideas.

Now thinking in a multi-level model:
If it's just the last conclusion, you could remove the latest conclusion Person.
If it's a conclusion that's been made earlier in the chain, then you can remove that Person and bridge the gap between the previous and the following one. This is not too hard: let's say these are persons in a tree: L1<L2<L3<L4<L5. To take out L3, just change the reference in L4 so that it links to L2 instead.

I would like the whole process to be reversible.
So you would not delete the link to L3, but add a "demoted" or "canceled" tag to it. Then it's still there in case you change your mind.

testuser42 2011-04-21T15:35:49-07:00

I think a 2-tier-model at least leaves the L1 Persons (and other Evidence Records) intact, and only messes with the L2 Persons. In a single level, there are no Evidence Records at all.
I thought that a "timestamp" or similar could mark the changes in the L2 person that "belong together", therefore allowing to undo them together...
But this is nowhere near as neat and tidy as a multi-level model!

The only argument against multiple tiers I can think of is that they might have a bit more overhead in the file, and don't read well for a human eye. But that shouldn't count, really.

gthorud 2011-04-25T17:22:37-07:00

testuser,

In your first posting, where you hide the L3 person. I would like to see a practical example where this is needed. It may not be a simple thing if you for example have have canceled lover level event when creating L3 and you are also likely to have things at L4 and 5 that depends on L3.

testuser42 2011-04-26T08:30:52-07:00

Hi Geir,
I'm aware that this could be complicated -- I'm just taking guesses at possible solutions. Someone with knowledge of the technical side of things would probably have better guesses ;)

That said, here's my take.
You might want to take out the L3 person because you realize it's not really the same as the others.
L3 combines L2 and some new stuff, which is in E3. Taking away L3 must only remove what is in E3.
If there were no conflicting PFACTs in L2 and L3, then L3 would really be "empty" and could easily be removed.
If there have been clashes between PFACTs from L2 and from E3, these should have been cleared in L3 (pointers or copying up). So when you undo L3, you look inside it to see what the clash was and what the solution was.
A preferred PFACT that came from L2 is fine, you can just delete L3 then because L2 is still there.
If the preferred PFACT comes from E3, it might have been used in the higher-up persons L4, L5 etc for other clashes. (If there were no clashes about that PFACT higher up, then you can just delete L3 and the PFACT from L2 will take over because it's the only one left).
But if e.g. L4 and L3 had a clash about that PFACT, then there are again 2 possible cases: The chosen PFACT could have been from E4 or from L3. In both cases, I think we replace the link L4>L3 with L4>L2 and ask the user which PFACT he thinks best now.
It might be good for the software to show all available alternatives for that PFACT in that situation and ask for a re-evaluation or affirmation. (There might be other alternatives higher up the tree).

(Note: instead of the new data in L3 or L4 coming directly from E3 or E4, it could also come from another Conclusion Person on top of its own tree)

testuser42 2011-04-26T10:28:57-07:00

Another thought:
What you actually are doing when you remove a Person from a tree is to remove the data from a certain source (or certain sources) from this tree. So there might be a way of working from the bottom up.

I've uploaded more graphics to help myself think about the process. Here they are:

remove-undo-permutations.pdf

remove-undo-steps.pdf

gthorud 2011-04-26T13:01:47-07:00

hi testuser,

I have trouble understanding the notation a:1, a:2 - I don't understand the numbers, and also what does a:173? mean. Thus, I also have a problem understanding the difference between the 4 cases. I assume a,b,c are PFACT values.

testuser42 2011-04-27T03:46:11-07:00

"a" is a type of PFACT, the value of which is 1 in a:1. "a:2" means there's the same kind of PFACT, but now its value is 2. So there's a "clash" there. Maybe equal-signs would've been better than colons?

The red reads "a:1/3?" (it's a slash and not the number seven).
"a" is the type of PFACT and the possible values are 1 or 3, about which a question ("?") should be posed.

gthorud 2011-04-28T04:15:42-07:00

Thank you, testuser. The diagrams makes sence now.

One obvious conclusion is that you can't simply remove a level, you will have reconsider values and pointers in all higher levels. And considering there could easily be 10 levels, that might be a lot of work.

Re. Rollback - which requires functionality in addition to removal of levels. Rollback is will require storage of allmost all previous values, not only E/C-persons, and could be a very complex thing. I am not sure how important it is to have structures necessary for rollback in a BG file. Could it not simply be an internal thing in a program?

AdrianB38 2011-04-28T10:10:50-07:00

Without delving too deeply into the possible algorithms, I currently tend to favour a "nuke from orbit, it's the only way to be sure" approach to roll back.

But this is a first thought, so I may be going in over the top.

In the diagram, we are deciding that P11 and P12 are not the same person. I would _not_ at this stage, want to say - "But P13 is." (You might be pretty certain they are, but let's do this 1 step at a time to start with).

P21 is therefore "removed". (I think I'd delete it but the option of keeping it in a dead state is interesting.)

As P21 no longer exists, then any proof argument / proof summary / quick one-liner that uses P21 as an input is now no longer valid. In particular, P31 needs to be similarly "removed".

Thus, my step 1 would have only P11, P12 and P13 left.

I would then have to write a new bit of logic that equates P11 and P13 and produces a P32 (P22 might be more appropriate, I guess). It has to be new logic because if the original logic of equating to P13 relied on a value inherited from P12, then that logic is no longer valid.

The problem with this idea is that if the match of P21 to P31 were some quite complicated logic, I really wouldn't appreciate the genealogical software nuking that argument out of existence so I couldn't read it again.

(If it's Tom's automatic combination software, it's not an issue but I don't think we're there yet with that matching algorithm!)

I see 2 possible ways to salvage something from the wreckage brought on by P21's demise:
1. Keep the documentation of the logic matching P21 to P13, but keep it in a dead, deceased, ex-accepted format. Then you can create a new proof, read the old one, copy and paste into the new one to taste.
2. Document the logic matching P21 to P13 to a much, much finer degree, saying exactly which values have been used as input to the argument. On the removal of P21, this argument also lapses (as per case 1) but since you can now see exactly what the logic was, right down to individual PFACT level, then you can say - "Ah, while P21 no longer exists, all the logic applies to P11 just as much - so I'll simply amend the inputs to the argument to refer to P11 and it's now valid again." This would be a manual process.

Essentially, the 2 processes come up with the same output - a correct matching of P11 and P13, but one takes long way of creating a new (copied) argument. The other amends just a few items to get there.

The 2nd method would have implications for the data model since we'd need the new, finer level of data - these would come from the research / admin entities in the Genealogical Research process, which is why I am so keen in looking at that.

HOWEVER - do I really see the ordinary user keeping the exact detail of what was used in the argument? Err. No.

testuser42 2011-05-01T07:28:13-07:00

Geir said:

One obvious conclusion is that you can't simply remove a level, you will have reconsider values and pointers in all higher levels. And considering there could easily be 10 levels, that might be a lot of work.

Yes, could be. But then again, these 10 levels will usually not have more than 10-20 actual PFACTs between them, and most will not be clashing with others. So, a user interface might present an overview of the (relatively few) conflicting PFACTs and the user manually selects his favourites (or not - leaving things as "undecided"). That shouldn't be much work.

I believe we don't need to be afraid about how difficult the internal workings of the software are. I'm sure computers are fast enough for that stuff. Also, the BG file normally is not the internal database of the software, it's more of a well-ordered snapshot of the internal data.

testuser42 2011-05-01T07:49:15-07:00

Geir: Re Rollback: Could it not simply be an internal thing in a program?
You're probably right! I've always imagined BG would have tags for "changed on ... by ..." on every Record or even every PFACT. Maybe that'd be enough to write a roll-back routine.

Adrian, I like the language/tone you're using ;)

I see 2 possible ways to salvage something from the wreckage brought on by P21's demise:

Good thoughts. Need to wrap my head around those a bit more.

You're right that the higher-level P31 (and further up!) could/should? be dismantled as well. Essentially, everything gets torn down to the last Level that was independent of the removed Person. Then these top Persons that are left get thrown together and re-combined, just as in the regular process.

But... this all happens internally in the software, doesn't it? That would mean, the software can present the old conclusions and "reasons" to the user (it has that stuff in its cache). The user then accepts them or changes them and only the results get neatly written in the BG, if you want it. Hm.?

AdrianB38 2011-05-01T10:06:49-07:00

"this all happens internally in the software, doesn't it?"
Yes

"That would mean, the software can present the old conclusions and 'reasons' to the user (it has that stuff in its cache)"
Yes. Not quite sure if I'd thought that bit through beforehand, but yes.

In reality, we are getting close to what the essence of the E&C Model is:
- pinpointing where the wreckage MIGHT start from in the case of a mismatch being found
- being able to review the subsequent logic, clearly separated from other logic, to see if that's still a/v.

All of which CAN be done in an Evidence-only model. But hey, we don't actually NEED a computer system. We're just trying to make things easier for those detail freaks like myself. Hopefully without carving up the situation for the rest of the world.

But I think those 2 bullets are what lie at the crux of the need for E&C.

AdrianB38 2011-05-01T10:07:38-07:00

D'oh
- I mean "being able to review the subsequent logic, clearly separated from other logic, to see if that's still applicable"

testuser42 2011-04-21T07:08:33-07:00

Another possibility in the multi-tier model would be to add a link to the highest Person (L5) that points to the Evidence Person that you want out of your tree. That way, all the previous stuff would be left alone, and just a new conclusion added, like usual.

      L3
     /| \
   L2 |  link going back
  /|  |  to E1 and saying
L1 |  |  "cancel this"
|  |  |
E1 E2 E3

testuser42 2011-04-21T07:09:41-07:00

In a 2-tier model, the new conclusions would be added to the existing L2-Person. The new conclusions (PFACTs) would also be based on and linked with an Evidence Person and its Source. So to remove a conclusion, you'd just "cancel" the PFACT you don't like anymore.

testuser42 2011-04-21T07:18:40-07:00

The specialty of a multi-tier model seems to be making bigger changes quickly.
Say you've two well formed Persons P1 and P2 on top of their own tree, each linking lots of evidence together. Now you find a source that convinces you P1 and P2 are the same. You'll probably have a Evidence P3 based on that source, and then a P4 that combines P1, 2 and 3.
If you later want to take that apart again, you just remove P4 (or cancel the links to both P1 and P2) and you've got your old persons back.

In a 2-tier model, you would have to look at all the PFACTs and see where they came from. If every change had a "timestamp" then that might make that untangling easy enough, but otherwise, it may be more work than in a multi-level model.

AdrianB38 2011-04-21T13:18:28-07:00

The issue for me in a 2 tier model is that I can't see how it's better than a single tier!

Take the roll-back idea.... Say I have 3 major sources - A, B and C and that by themselves you can't say they all apply to the same person. However, let's say that A and B can be recognised as applying to one person. When A and B are combined then, and only then, is it possible to recognise that they refer to the same person as mentioned in C.

If I only have a 2 tier model, then I have 3 lowest-level people all suddenly coming together in 1 go. Which simply isn't true. I have to write the full description of the ordering in the text of the proof argument that backs up the triple-linking.

Now suppose I suddenly find that I have got source C totally wrong in my analysis - maybe I've misread the source!!! I have to remove C from the reckoning. But I don't need to split the person referred to in Source A from the one referred to in Source B - that conclusion is not affected.

In a multi-level model, I just peel off the person represented from C, leaving the person who represents the total of data from A and B.

In the 2 level model, peeling off the person represented from C, also splits up the data from A and B, and I have to go back and read my notes to see whether A and B can still be matched. And then do that bit again.

Yes, I've just done a very simple run through and real life might often be more complex than this.

testuser42 2011-04-21T16:28:47-07:00

Records and their Source(-References)

See also Discussion Extraction Entities and Source Citations at Mikes "BetterGEDCOM Attempt"

Just trying to clarify and sum up:
(I'm trying to use "Source" and "Source Reference" to be precise. Bear with me...)

In a multi-tier model, the only Records that have a Source are L1 Records. They always have one and only one Source.
The higher-level Records don't need a SourceRef. They point to the Records below them, and these again to lower ones, until they reach Level 1. You could call the links from the higher to the lower levels "SourceRefs" - but aren't they just "links" until L1 links to L0?

Mike has a point when he says that the only true Source of a higher-level Record is whoever created that Record to bring together two or more lower Records.

In a Model with 2 tiers, the L1 Records again are the only ones that a Source. The L2 Records link to L1 Records, but also the PFACTs inside them could have (several) links to L1 Records.
(Again, a possible Source for the L2 Record is the creator, but since there are more conclusions stuffed inside one record, there might be more than one creator...)

A single-level Model needs direct SourceRefs from the PFACTs in its Records to the Sources. There's no use for a SourceRef for the Record itself.

Backwards Compatibility?
Should we allow SourceRefs from PFACTs in any Record? It would make it easy to adapt GEDCOM files - no need to shuffle anything.
It would allow using all three Models simultaneously... Not sure if that possibility's a good thing ;)

testuser42 2011-04-21T16:58:47-07:00

If you look at it from the Source upward - what can a Source be linked to?

Anything that can be "extracted" from the Source, so: any L1 Record like Persons, Events, Groups or Families...

Possibly the PFACTs inside a Record.

But also Note Records (especially Research Notes - to show things like: "no John Doe found in this document")
Other administration records?

In the other direction, a Source would link to a Repository.

Anything else?

AdrianB38 2011-04-22T04:05:55-07:00

"There's no use for a SourceRef for the Record itself"

Um. Not sure on this. I do record a citation against a person at their highest level (i.e. against the person, not their PFACTs) to say that this is the evidence for their existence. Useful sometimes where you know nothing other than someone's existence - e.g. a will that says "To my son I leave ..." and doesn't name him. There will be some who refuse to stick a record into their database for the son, thus avoiding this circumstance. Fair enough but my preference is to add the person because otherwise I'll forget the fact of their existence. So in my view there is a need for the ability to record a source for the existence of a person. Which I think means there is a use for a SourceRef for the Record itself.

Re Links from the Source - I suspect yes, there are other "admin records" that a Source could / would need to be linked to but right now I'm not wholly sure we have an agreed list of Admin entities - by the way, I HATE the term Administration for these entities. They're involved with Research not Admin. Admin, to me, is taking note of your expenses! (A very necessary step if you're a professional, of course!)

I think Source to Source links need investigating - imagine a digitisation of a transcript of an original parish register. Rather than trying to cram all the relevant details for the bibliography and footnote citations into one Source, there MAY be sense in recording all 3 in the database, linked up, in order to generate a multi-part (multi-level?) citation sensibly. But I don't know...

testuser42 2011-04-22T04:43:48-07:00

Ah, good point. So there ARE good reasons for SourceRefs directly attached to Persons.
So, we should still allow that in BG - someone might need it, it may make GEDCOM import easier, and it shouldn't interfere with other modes of working.

"Research" is a better term, too. Mark Tucker's "Genealogy Research Map" could be a guide to terminology, it sums various ESM and BCG concepts nicely.

Source to Source links could be cool - again the idea of a "Source tree"... Not sure if it's needed, but still think its cool.

gthorud 2011-04-25T13:54:54-07:00

I am writing with my left hand, due to a right shoulder problem, so have to limit my writing.

We have to be backwards compatible, and conclusion persons imported from Gedcom will have links to “sources” (actually citations) in events. So, the simple one way to do it is, if you merge two events at L1 into a conclusion level event, to “cite” directly from the conclusion level event – ie copy the reference upwards – assuming the two L1 events are canceled after the merge. You might even want to change the reasoning in the citation, if any.

Source to source links. Somewhere on a hard disk, I have PHP code for a source database that I started to develop 10+ years ago. It had such links, with a type field – i.e. is digital copy of, is a digital copy of part of, is transcript of, is part of, is abstract of, is review of – and more. The tricky one is “is part of” – i.e. multi level sources – to be discussed.

testuser42 2011-04-22T11:21:03-07:00

Persons and Events

I've uploaded another graphic:

multi-events.pdf

multi-events.png

This concerns possible handling of Events.

I've not yet drawn up a "L2-Event" -- I imagine it being quite similar to an L2 Person.

testuser42 2011-04-22T11:44:57-07:00

This time I've used "pointers" to resolve clashes. In GEDCOMese, it might look like this:

0 @P31@ INDI
  1 PFACTC @P21@
  ...
  1 PFACTF @E12@
    2 NOTE (Poodle)
  //or maybe
  1 EVEN
    2 PFACTF @E12@
      3 NOTE (Poodle)

If you want to take one Event and make it the preferred one, including all its PFACTs, you'd could have a simple pointer like
1 EVEN @E11@
in the Person.

But how best to mix-n-match parts of Events?
I imagined the PFACTs of the Events can be treated just like those in Persons.

Maybe it would be wiser to combine Events into a L2 Event first? not in all cases?

AdrianB38 2011-04-23T12:39:06-07:00

"I imagined the PFACTs of the Events can be treated just like those in Persons"

Without thinking through the implications too much, this has to be a possibility. If you can have evidence and conclusion persons, then you have to be able to have evidence and conclusion families, evidence and conclusion multi-person events, evidence and conclusion groups, evidence and conclusion places, etc. Anything that exists in the real world, that we wish to study via records about it, should have evidence and conclusion versions.

No - not evidence and conclusion sources because we can study those directly!!

Then anything that applies to evidence and conclusion persons, also applies to evidence and conclusion anythings.

All that is something of a theoretical statement....

For PFACTs of Events, I'm not sure - at the moment I was only imagining Date, Location, Notes, Cause, etc, all of which make up one PFACT - whether or not you wish to handle the components of Events (or PFACTs, a.k.a. attributes) is a question that applies outside E&C.

gthorud 2011-04-24T11:11:00-07:00

My bone marrow is very skeptical to breaking up an event and have E&C for parts of events. It will be more complicated, and will also add more overhead. If you have to add a note to an event I think you should create a new version of the event. The simplest thing is to have everything in the event in one place.

testuser42 2011-04-26T10:39:47-07:00

I was thinking mainly of Date and Location as "PFACTs" of an Event, but I thought there might be others I can't imagine now. Maybe you're right, Adrian, and there's actually just one PFACT: Date'n'Place?

The simplest thing is to have everything in the event in one place.
I agree. It's much better. Forget the above...

testuser42 2011-04-26T11:00:35-07:00

So lets say in my Graphic I've a L2 Conclusion Event. Inside are links to E11 and E12, and the clashing PFACTs are sorted out (with pointers: D: E12; E: E11; F: E12). There's an addition to F that says "Poodle". There's reasoning attached.

Now which Persons does that Event link to, if any?
Possibilities:

NONE -- Persons are indirectly linked through the roles of the lower level Events.
ALL -- Direct links (via Roles?) to all of the L1 Persons that had a Role in the L1 Events. Some L1 Persons will turn out to be connected, but is that a problem?
SOME -- if we could find the topmost Person for connected Persons. But that's crap, I think.

We could make additional links from a high-level Person to a Conclusion Event just for clarity and shortcuts.
A good software would probably find the Event-trees that are connected to a Person-tree and then ask about that. But would we let the software create such links by itself? Internally, sure. But in a BG export file? Why not - it's just repeating the facts...

testuser42 2011-04-26T11:01:15-07:00

^^ so "SOME" is not pure crap, I think ;)

gthorud 2011-04-30T18:14:39-07:00

Reasoning notes for CPs - Collapsing multiple levels into one

The current assumption is that each Conclusion Person will have a (research) note capturing the reasoning behind the merging of the next lower level persons. I am aware that there are issues about where that information should be output in citations etc, which is highly relevant, but I leave that aside at the moment.

When a multi level E&C structure is imported into a one level, conclusion only, program, the latest statement about how that should be done is that all "reasoning notes" are simply copied into the single level.

As I have stated before, that will not work unless each note contains a precise reference to the facts (eg. events) it is about (by using eg footnotes for citations), and may not work even then.

In many cases, I fear that the note will be written in the context of the information available when the lower level CP is created (the user will see the info about current underlying persons on the screen) and the user will not include references in the note. That context for the note (the lower level CP) will be lost when many levels are collapsed into one.

A Similar problem will occur if the user prints the available information about the current CP, which is "summarized" from the lower levels, and wants to include the reasoning notes. The underlying tree structure could be reflected in special reports, but the structure is unlikely to be output in normal published reports. Considering that there could easily be 10 levels, these notes would most likely not make much sense if concatenated in a published report, even if they referred precisely to each fact.

The reasoning notes may be useful only to the user who wrote them, and only on multi level systems. Given that the user will often want the reasoning to appear somewhere in published reports, it is probably better to allow the user to record the reasoning (once) in other ways that are more suited for output in published reports - and which will transfer to a single level program without problems.

Solutions TBD.

AdrianB38 2011-05-01T04:54:24-07:00

Geir raises an important point here - will the output be meaningful?

The reader may be tempted to dismiss this as just another reason to avoid the Evidence & Conclusion Model. Who wants to read why Evidence Person 4472 is the same as EP4498?

I suggest this is NOT a minor detail - the reasoning why EP4472 is the same as EP4498 is the SAME reasoning why the baptism of this John Doe here, refers to the same John Doe as that marriage there. Omit one and you've omitted the other.

My thought (at the moment) is that the only way to produce a meaningful output is to take the bull by the horns and actually list out the arguments why EP4472 is the same as EP4498, using pretty much that terminology (with the addition of key details like the names, etc). The first EP would be described in terms of what source it's taken from. The rest in terms of what they merge. Anything less results in a context-free, meaningless mess, as Geir suggests.

"it is probably better to allow the user to record the reasoning (once) in other ways that are more suited for output in published reports" - perfectly happy with this idea, so long as it doesn't stop me putting it in several areas, broken down by which logical argument it applies to.

Let me re-iterate the importance of what we're talking about here
- the reasoning why EP4472 is the same as EP4498 is the SAME reasoning why the baptism of this John Doe here, refers to the same John Doe as that marriage there;
- it is crucial logic;
- to step up onto MY own soapbox, in too many cases this logic is missing from view - apparently beautifully cited evidence that seems to rely on John Doe being the only person of that name in the USA!
- yes, I'm sure that you, the reader, do record exactly this logic in a citation / footnote / shared notes, etc. So do I. It's the rest of 'em I'm worried about.

testuser42 2011-05-01T07:54:15-07:00

Maybe we should take a short example for a test ride, with some realistic values and sentences. Then we could see what happens where!

Collapsing a multi-level BG into a single-level might not work without manually rewriting some of these sentences. If a programme has only a single level internally, it will have to offer a few smart import dialogues for the richer BGs that would come out of other software.

AdrianB38 2011-05-01T10:58:46-07:00

"If a programme has only a single level internally, it will have to offer a few smart import dialogues for the richer BGs that would come out of other software."

Yuk

Sounds like BG should make the recommendation that any s/ware capable of storing multi-level stuff, i.e. E&C, needs to be able to output a flat, single level interface file. It is simply not fair to expect a s/w developer who's not gone down the road of E&C BG, to be able to parse such an input.

(And yes, the implication is that I do think that there should be 2 levels of BG-compatibility - 1 with E&C and one without.)

louiskessler 2011-05-01T12:15:18-07:00

"It is simply not fair to expect a s/w developer who's not gone down the road of E&C BG, to be able to parse such an input."

Thank you Adrian. This should be the attitude. If you want acceptance by the programming community, BG should be written at a general level that will accept everything, but not require anything.

Maybe it's not quite as loose as that, but that's the idea.

Louis

gthorud 2011-05-02T16:00:43-07:00

Re. Adrian’s first posting:

I agree that the only sensible way to describe the reasoning in a published report is to include the “context” in the reasoning – or use a footnote/endnote (– unless we can come up with some other smart way to do it). But if the reasoning is not intended for such output, just a note for the researchers internal use, the reasoning could be much simpler as shown below – and will then cause problems after merging into one level.

Testuser asked for an example.

It is easy to create examples since there will be a lot of users writing sloppy reasoning notes.

Assume you have a christening record and a probate record, which you merge into a CP called Peter.

A sloppy written reasoning note could be “The child in the christening record is most likely the same as Peter mentioned in the probate, since there are no other Peter christened in the parish that fits with the age in the probate.”

This note makes perfect sense since you only have two records, and the note is unambiguous. Few users will write more than this, because there is really no need – at the time.

Then, you add a confirmation record, a marriage record, a few christening records (one child called Peter) and a burial record, and finally a probate for the first Peter, most likely inherited by his son Peter. For each record added you will have other reasoning notes.

If you merge the tree into one level, and simply collect all the notes and PFACTS, which Peter will the first note refer to? The first one, or his son. Well, you might be able to figure it out, but it will not be as easy as when the note was first created.

I would like you to consider an alternative, or complementary, solution to the currently proposed notes that unless merged to one level, remains at the level they are written. You might call it “a climbing reasoning note”, and is actually the “old” idea of copying upwards - and maybe someone consider this solution to be in the model already.

The content of the note will be as described by Adrian above.

For each person there will be a special type of note, just one note (or a structure of notes which might be needed for other purposes), that is copied to the level above when a superior CP is created. This may imply merging of two (or more, we have somewhere said that we will not only have binary trees) notes – and it is very likely that the user will have to edit the notes after merging – just as today when merging persons. The difference, compared to the currently proposed reasoning notes, is that the new (merged) CP note will exclude any such notes at the lower levels, but the lower level notes are not deleted unless you merge to one level (when they are deleted).

I think this is analogues to what you would do today in a conclusion only model, where you start out with a small note, and add to it as you add more facts to the conclusion person. (There will also be footnotes and whatever, this is only part of the solution.)

Where the climbing note is output in a report, and if there will be other notes for the person containing data (not reasoning) is something to look into – although the report output will not be defined by BG.

Regarding Adrian’s suggestion to do the merging into one level before export. That will move the burden from the vendors to the user. The only program that knows if a merging is necessary is the importing one – or you require the exporting user to know the capabilities of the importing program. Requiring the user to know this is not a good solution, the user may not know anything about the importing program at all.

The best solution for the user is that the importing program does the merging, if necessary. So, in my view, unless the E&C structure becomes very complicated, much more complicated than currently seen, I think we should try to describe an algorithm that an importing program can use – that will also be part of the proof of concept for the E&C model. If you expect an E&C program to traverse the structure related to the persons shown on screen, perhaps 20 persons or more as some programs do, each time you change the view to a new person – the algorithm can’t be too complex. When we see what it takes to do the merging, we can make a decision about where the merging takes place.

An alternative might be to export both multiple and single level structures in the same BG-file – possibly by essentially duplicating of a lot of pointers to events and notes. (I am not concerned about the file size at all.)

AdrianB38 2011-05-03T02:54:01-07:00

Geir's idea about merging up notes is interesting - I think it would only apply to notes that are linked just to a person. Any note linked to an event or an attribute would carry the context with it.

However - having sat here for some minutes, I think we may be missing the context of how this data would appear to the user. Or perhaps, how it COULD appear. We're getting dangerously close to designing the application rather than the data model but as I have said before in defence of this closeness, we need to assure ourselves that it is possible to use the data model in practical algorithms.

So - how it COULD appear and SHOULD appear is, I suggest, like this:

Remember what Tom said about how the personas and persons appear to users of newFamilySearch.... They CANNOT see the difference and CANNOT see the individual personas making up a person. I am now firmly of the belief that this is how any application using an E&C Model / BetterGEDCOM compliant database must work on normal data entry.

1. Enter the original baptism. This creates a source record and a level1-person (i.e. an evidence-only-person or in nFS terms, a persona) with that context sensitive note.

2. Over months, enter other details, developing a conclusion person at level-10 (say) with the probate and other children's baptisms as Geir suggests. At no time do we realise that this is the same human being as we entered in step 1 above.

3. We realise that finally, we have enough information about the level-10 person from step-2, to 'prove', to GPS standards, that they are the same human being as we entered in step 1 above.

In the applications GUI we only see TWO people. Behind the scenes, one is that level-1 person from step-1, the other is a merged representation currently held at level-10. We mentally say - yes, these are the same person and press the button in the GUI to say that they are the same person and please "merge" them.

4. The application merges them so on screen we have just one person. Behind the scenes it is now a level-11 person linked back to a level-1 person plus a level-10 person.

At that point it is incumbent on the user to just read through the data on the screen to see if it all makes sense. At that point, they find there is an ambiguous note (because it depended on a context that isn't there any more) and alter that text to read "The child in the >>>1787<<<< christening record is most likely the same as Peter mentioned in the >>>1850<<< probate, since ...."

This will insert the new text for the note with the new level-11 person, with the level-1 note now being linked to say it's superseded. It doesn't update the level1 note because that goes against the principle that an update should not destroy a previous version of the data.

OK - objection 1 is that the user will not review their notes and it'll remain ambiguous. Fair comment - but this is what can happen now, anyway - I can easily create a note whose context is such that it only makes sense in terms of the current information. When I add more PFACTs, the note goes ambiguous because the context has changed. It's no different.

The key here is that the information WILL all appear on the screen in one place, against one person, thus enabling review. In earlier months, I was not thinking along those line, hence (IMHO) a lot of the confusion about what users would see and need to do.

Objection 2 is that the E&C model will encourage more data going into the application's database in a random fashion for subsequent linking, thus increasing ambiguity of context sensitive notes. I don't think this is going to happen in practice. I'm not going to go to the trouble of entering a baptism (say) unless I'm pretty confident that this will be one of my relatives - in other words, most entries into an application will still go for immediate matching with someone already there, reducing the time interval where context sensitive notes can be created.

In summary (finally) - a well-designed E&C Model application will (IMHO) enable users to see all the data linked to a human being and enable them to detect and correct context-sensitive notes as they appear.

AdrianB38 2011-05-03T03:04:36-07:00

Geir - re your statement that "The best solution for the user is that the importing program does the merging" and your 2nd suggestion for getting round it:

I take your point that there is an issue about the user needing to know - and not knowing - whether the recipient app is E&C model compatible or not - this is a real issue. But if we are going to accept that there will be apps that are B&G compatible EXCEPT that they don't understand the E&C model, then we simply cannot expect those apps to be able to level-down the E&C BG file to be C-only. If they understand it enough to level the import down, then they understand it nearly enough to make the app E&C compatible in the first place. But it isn't....

I suggest a couple of possibilities to get round the real issue that you suggest:
- provision of level-down utility programs. This is probably not a good idea since it still depends on the user knowing when to use one.
- provision in the BG file of TWO sets of data - one is E&C multi-level; the other is single level C-only. Which is exactly what you suggest above in your final para, so it must be a good idea (grin).

testuser42 2011-05-03T14:44:59-07:00

Lots of good stuff...
Just a quick thing I thought about:
Your example: "The child in the >>>1787<<<< christening record is most likely the same as Peter mentioned in the >>>1850<<< probate, since ...."
Maybe a note could hold links to the sources? Here, the words "christening" and "probate" could have been linked to the source records when the note was first written. Then it's not that ambiguous. A really smart software might follow the link and offer something like the "short title" of that source to appear in reports.

Could it be possible to derive context by just looking at _where_ the note is contained? We put that note there not just for fun but for a reason ;) But maybe that's really too difficult, especially when the note is for a Person that combines many different Records.

The double data set sounds like a very practical and smart idea!
I'm sure people will write conversion tools, too (e.g. for helping people to squeeze most of BG into oldGedcom) but this is easier for software programmers...
Filesize won't be much difference, it's just the textfile that gets doubled, not the big things like images.

testuser42 2011-05-03T14:55:12-07:00

Sorry, I overlooked Adrian's sentence Geir's idea about merging up notes is interesting - I think it would only apply to notes that are linked just to a person. Any note linked to an event or an attribute would carry the context with it.
That's what I thought when I wrote that 2nd paragraph.

For Notes directly in Persons, I think the merging up is smart and somehow fits right in with the overall principle: If there's a "clash" between two things, the user needs to be asked to resolve it. The solution is recorded in the new Record, the previous stuff is left alone.

gthorud 2011-05-06T17:27:46-07:00

I can just say amen to what Adrian wrote in his second last posting May 3 11:54. (The climbing note applies to the person.)

Testuser proposes links to sources, yes, that is already in the Requirements Catalog, so I agree. Depending on the capabilities we develop for citations, a short title or whatever citation element could go in an inline citation or it could be just a number referring to a footnote. In notes attached to events there are also programs that let you insert “variables/fields” that will present (variable) info from an event, eg date or name (some progs allow the name from the source to be used in the event) in the note, but this auto updating does not export in Gedcom, at least not in a way that other progs will understand. (Might be a feature in BG – it might help get rid of some robot language.)

But, I have discovered a problem with the “climbing note”. Say it has climbed from level 1 to 10, what happens if you remove a person (or change something) at say level 3 (somewhere low in the hierarchy). This is likely to affect the content of the note, and you may correct that at level 10, but there are probably also copies of the text at level 4 through 9. The user is not likely to want to correct all those.

I am going in circles on this issue.

I think we agree that a reasoning note must provide context, one way or another, no context will cause problems in merging to one level and also in reports. So everything has to be based on this.

Another idea is to “merge” the original reasoning notes at each level with the idea of a climbing note, in the user interface - "Climbing notes advanced version". At eg. level 5 you would see the reasoning notes for all levels 1-5 in a window, concatenated but still separate.

If you make a change in the note at level 3, that change will appear at level 5 automatically, and the other way around (if you edit it at level 5). (I assume the user is referring to sources and not making use of the “level context”.)

If you remove a lower level (3), the corresponding part of the concatenated note at level 5 will turn grey (no longer valid).

And in-between the parts tied to each level you could be able to add text not linked to a level (but this text would still have the problem of the climbing note mentioned), and you could even be allowed to change the order of the parts linked to a level so they are not concatenated in the order created. (The latest two things will require support in BG.)

But, there would still be limitations – text cannot cross “note part boundaries”, unless you allow those parts to be broken up into eg sentences.

You could even. at level 5, highlight the events associated with a level by clicking in a note part (or sentence) linked to that level (, and push a button or whatever).

Maybe I have gone too far in complexity? I would not be surprised if there are problems I have not seen.

AdrianB38 2011-05-05T12:27:16-07:00

Reqts for E&C; 1 - Codifying Source Info

Perhaps one issue is that we have never written down the user requirements for the Evidence & Conclusion Data Model (E&CM). In the Requirements Catalogue, we have just 2 Evidence** requirements, viz:
Evidence01
Title - Evidence & Conclusion Model
Description - BetterGEDCOM could handle evidence and not just conclusions
(followed by a lot of cross references) and

Evidence02
Title - Proof Argument and/or Process
Description - BetterGEDCOM should support users need to record and share proof arguments supporting and/or supported by the evidence and conclusions therein recorded or shared.

Note the last does not (at first sight) relate to the E&CM but seems to me to be hugely important to the process as a whole.

I'd therefore like to attempt to document 2 slightly more basic (and less self-referential) requirements.

The first is perhaps a bit easier to write than the 2nd, is less linked to E&CM and is, in fact, perhaps not even possible.

AdrianB38 2011-05-05T12:48:42-07:00

This is the first (draft) requirement. It is related to evidence but peripheral to E&CM:

Title: Codification of Information in Sources

Description: BetterGEDCOM could provide the ability to codify the information contained in a source in a machine readable form that could be interpreted by an application program. The stored information should be a faithful equivalent to the relevant contents of the source.

Importance: Desirable.

Why?: Numerous people have indicated their desire to encode information from sources. If encoded in a form readable by an application, that application MIGHT (depending on many other things) be able to
- provide a more precise trail of the arguments that have been gone through;
- detect anomalies between source and person / family / other-entity;
- propose values for PFACTs of, or relationships between person / family / other-entity;
- allow drag-and-drop between information in sources and in persons, etc.

Way forward?: Three possibilities spring to mind:

1. Take the source-text of a source and mark it up, showing where the information is.

2. Create a record layout for each type of source. The record will contain the information values from that source, each item in its own field. E.g. A British Census Record could be designed to contain all the _information_ from such a census - if XML were to be adopted for BG, it would be logical to suggest the British Census Record would be an XML format, held within the XML for the relevant Source record.
Cons: As the number of Source types approaches infinity, the number of record types for the source information similarly approaches infinity.

3. Since all the relevant information from a Source record of any type will map onto an individual person, family, place, etc, use the formats of those objects to store the information from the Source. Such a use of a Person record format to encode the information from a Source could be referred to as a Persona.

AdrianB38 2011-05-05T12:52:19-07:00

Now - before you get all excited over option 3 (as I did), start thinking please about these aspects:
- how do you encode a signature?
- is it sensible to encode ages on a census in the form of Birth Events against a Persona?
- is it sensible to encode assumed relationships in a family, taken from a census that doesn't have explicit relationships, as actual relationships between Personas? If not, how would you encode them?

gthorud 2011-05-07T15:54:31-07:00

It seems to me that there are several uses for codified information, the most important is probably to be able to organize and search for information. That requires the most important info from the source to be recorded in an event as we know it. And some programs are able to detect some anomalies based on this, and some services suggests “new information”, but the latter suffers from proposing a very high (80-90%?) degree of irrelevant info – the problem being that there are a LOT of persons that satisfy the criteria.

It seems to me that the proposed “codification” requirement wants to take this a step further?

Living in a place where a lot of information has been transcribed, and are available free on the internet, I find it more useful to do a manual search in well-designed databases (separate from my genealogy program, based on standards for codification of censuses and church records) rather than have my program do the job. Also, what is left of the fun in genealogy if a program were to do the job?
Three alternatives are proposed. I don’t see them as alternatives, but rather as possible complementary solutions.

Re. 1. I assume BG will allow recording of text from source text and translations of such - separate from the E&C pertsons. Marking up the text could result in data stored in events or structures as in alternative 3 and 4. Aside from that, I am not sure I see the big benefit of recording exactly where in the text that structured data is derived from (assuming that the recorded text is not too large – you could even mark up the important bits and pieces – there is one prog with that feature).

Re. 2. (There is a posting somewhere on the wiki where someone has developed a solution for recording of UK censuses, and I think I have seen programs for US censuses.) Having seen a couple of standards for census and church record recording, the latter done in XML, I know that is a major task to define such things. Given all the other types of sources, Evidence Explained (for US citations) would be a tiny thing compared to what you would have to create to describe the info in the sources.

If you were to do it, you would have to create a general solution that could be adapted to specific sources by source type specific templates, partly based of a harmonized set of the most important “data elements” e.g. surnames, birth place, place so you could do things with data across source types. You could use this general format for downloading transcribed records from services on the internet. You could have a separate part of your program for storage of structured sources, and functions for converting the relevant info into events.

Finally there exist programs that will take a transcription of certain source types (in tabular or xml format) and convert some of the info into Gedcom, so you can load it into a program and merge it with your genealogy data, and/or keep that in a separate data set/project/? and link it to your data.

I see alternative 2 as a possible separate requirement, but not as anything integrated with persons in an E&C tree.

Re. 3. Probably not more to say about this, than it is the way we have discussed for E&C for a long time. If there are concrete examples of additional info from sources that we cannot capture in events etc., and that could be useful in a codified structure, we could discuss that.

How do you encode a signature – in an image of the source. I don't understand the issue here.

Using age to record a birth – there is no other way to record a birth. And you also encode the age in a census event, but it will not handle all info in the census. It is an interpretation of the evidence. (Gene does not like the term Evidence person (the evidence is elsewhere) and she is probably right, but that is a different discussion.)

Re. person relationships from censuses – if the relation is not recorded in the census it may not be appropriate unless you base it on assumed recording practices or rules. You should also in addition record relations in a census as roles – named after roles in the family or hose hold.

In summary, I don’ see us recording a codified structure that captures ALL the info faithfully in the structure of E&C persons.
Some of the info in that structure will have to be an interpretation (, and some of the info will be recorded in other structures than person-structures).

But, I may not have understood the whole issue raised?

AdrianB38 2011-05-08T05:25:05-07:00

Geir
I agree with pretty much everything you say.

Yes, codifying information makes it easier to handle for searching and organising etc. But equally, the signal to noise ratio on any app using this "other" info is not inspiring (c.f. your 90-80% irrelevance rating).

Some specifics: "How do you encode a signature? – in an image of the source" Sure - but that's only storing it. It can only be interpreted by the human view and it could be interpreted where it is in the source-record.

"encode the age in a census event" - this is one of my concerns. I found myself writing down the ages on each census in the citation data, so that I could easily scan the sequence of ages:
- 2, 12, 22, 31, 42, tells me easily that this chap and his family are pretty consistent and therefore (pious hope!) trustworthy.
- 2, 12, 20, 34, 39 tells me I really ought not to be too trusting!
If these ages are translated to their GEDCOM equivalent of
- born btw June 1838 and June 1839, born btw April 1838 and April 1839, etc, then life gets trickier to read.
And I really don't like to think about the interpretation that goes into a US or Canadian census where the date of the actually data being written down can be weeks away from the supposed effective date.

I think you sum it correctly up with "I don’t see us recording a codified structure that captures ALL the info faithfully in the structure of E&C persons. Some of the info in that structure will have to be an interpretation"

This is no doubt me a rigorous logician - but if what I'm getting is a pragmatic, slightly inexact translation that still means I have to go back occasionally to the real source, then I'm - disappointed with the persona idea.

louiskessler 2011-05-12T18:29:14-07:00

1. Take the source-text of a source and mark it up.

No. Very problematic as mentioned above.

2. Create a record layout for each type of source. ... the number of record types for the source information ... approaches infinity.

I've argued before that codifying Mills or Lackey or the next new citation guru that comes along is not the way to go. Leave that up to the individual programmers who want to do that to try to get it right. BetterGEDCOM need only be general and flexible enough to handle any citation methodology.

3. Use the formats of ... an individual person, family, place, etc, ... to store the information from the Source.

This is wrong because one item of evidence (e.g. a census record), will have to be mapped onto several people, a family, a place and who knows what else. You're turning 1 complete and very usable self-contained record (one item of evidence from a census record) into a distributed disarray of bits of info that will be tough if not impossible to reassemble.

So I again stress the correct way is:

4. Codify source info into Evidence records.

louiskessler 2011-05-12T18:38:52-07:00

Geir talks about searching up above. There is another misconception here that hundreds of different tags need to be coded to allow searching to be done properly. That is not correct.

Tags are to classify items into major groupings. You only want tags for major items and then have a miscellaneous tag for everything else, like a description.

e.g., Do you think having 500 tags for every possible event you can think of is better than having about 20 major event tags (BIRT, DEAT, BURI, MARR, CHRI, ...) and a catchall for the rest with a user-defined description: EVEN/TYPE.

Having 500 tags makes it harder, not easier to search. You have to first figure out which is the correct tags. Only then do you search them.

Search should be all-encompassing, and full-text to allow Google-like searches of the data.

And Evidence records could be the single entity that need be searched through.

AdrianB38 2011-05-13T05:18:42-07:00

Louis - I note one aspect from your posts above that I'd missed before, which is that your evidence record is just one per source record.

So would it be the case then that your evidence record for (say) a census form for a household of 6 people would have (say) 6 different NAMEs; 6 different AGEs, one PLACE (or ADDRESS) and then a number of different free-format text lines recording the stuff that _doesn't_ make the top-20 - or would ALL the information from the census get recorded in the free-format text and the top-20 are, effectively, just there for searching, not encoding?

louiskessler 2011-05-13T06:43:18-07:00

Adrian:

See this post of mine in my discussion with Tom for an example:

http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32726136

AdrianB38 2011-05-05T13:12:29-07:00

Reqts for E&C; 2 - Permanent Record of Evidence and Conclusions

This is the 2nd (draft) requirement. It is the heart of the Evidence & Conclusion Model (E&CM). At least, it's what _I_ think the heart is.

Title: Permanent Record of Evidence and Conclusions

Description:
Scope - data relating to a real-world genealogy entity, specifically a person, family, group, place or other miscellaneous entity such as a ship or other artefact of historical interest. This includes relationships between those things.

Requirement - any time that data or a relationship relating to a real-world genealogy entity (defined as above) is created, updated or deleted, then BG should provide the ability to permanently record the state of that real-world genealogy entity / entities BEFORE the creation / update / deletion and AFTER.
It must be possible to navigate between superseded real-world genealogy entities and their replacements.
It should be possible to relate those changes to any relevant proof and / or sources and vice versa.

Why?: Only by keeping a permanent record of the inputs to and outputs from any piece of research (is this the correct term?) will it be possible to correctly review whether an conclusion is still correct if it is necessary to adjust the interpretation of some evidence somewhere. Only by keeping a permanent record will it be possible to identify conclusions at risk if it is necessary to adjust the interpretation of some evidence somewhere.

My note - this is, to me, the core requirement that the E&C Model tries to fulfil. It's my first attempt at writing it down in a format less than a page so is probably somewhat off track - comments please?

AdrianB38 2011-05-06T04:18:35-07:00

An alternative form of this requirement might read:

Title: Permanent Record of Evidence and Conclusions

Description:
Scope - data relating to a real-world genealogy entity, specifically a person, family, group, place or other miscellaneous entity such as a ship or other artefact of historical interest. This includes relationships between those things.

Requirement - At any time it should be possible to see the full details of all research steps that have ever contributed to a value held against a real-world genealogy entity (see above). In particular, it must be possible to see (directly or indirectly), for each step:-
the details of the evidence in the sources concerned;
the details of the relevant proof argument / proof summary / conclusions;
the details (of other entities) that were used at the time as inputs to the research step;
the updated values emerging as outputs from that research step.

Why?: Only by keeping a permanent record of the inputs to and outputs from any piece of research (is this the correct term?) will it be possible to correctly review whether an conclusion is still correct if it is necessary to adjust the interpretation of some evidence somewhere. Only by keeping a permanent record will it be possible to identify conclusions at risk if it is necessary to adjust the interpretation of some evidence somewhere.

AdrianB38 2011-05-09T09:00:44-07:00

See http://bettergedcom.wikispaces.com/message/view/Goal+Oriented+Research/38505748
Gene said "I know some users now don't abstract or extract bits into their working file reference notes. When they find another item of evidence, they either enter another source to the same tag (again without the abstract/extract) or as a source to a new tag. Over time, because they haven't kept that "snippet" in the reference note, it's pretty easy to lose track of the different information from the different sources."

AB replied "Exactly. Been there, done that, got the T-shirt of confusion.

"And that is precisely one of the issues that the E&C Model attempts to deal with."

This req't "doesn't quite mention that aspect but I realise now that is a side effect of the requirement. Or is it a side-effect of the solution?

"Either way, it would be possible for the user to drill down from the list of sources cited against a PFACT or Event. Because the record of previous values is permanent, at each level down you see the previous values and the previous cited sources. At some point in your descent you see the bits change and it should show therefore which cited source triggered the change.

"It's probably not simple to design a decent GUI for, but the E&C data model enables it."

gthorud 2011-05-09T16:55:38-07:00

It seems to me that this requirement goes beyond what the original E&C-model could do. It was not able to record every little change you made, and keep a record of the state of everything (incl other persons etc) before you did that change. If, say, you removed an EP or CP from a tree, because you decide it is a different person, the E&C model had no way to record the previous state of the database. Also, if you change a date or whatever because it was not recorded correctly, perhaps deep down in the tree, there was no intent to keep the previous date. I am not sure I see all the implications of this requirement, but it could for example require addition of new E/C-persons for every little change you make, simply adding or changing a word – or it would require many versions of the same record. So, my question is, do you envisage changes to the model to fulfill these requirements?

If you interpret a source in a new way, it should be possible to find the citations referring to the piece of that source and see the current (previous) interpretations of that, and correct them – you do not need the whole history of the database.

AdrianB38 2011-05-10T08:43:24-07:00

Geir
The possible impact of recording every little change worries me also. But what else is the tree of evidence and conclusions for, if not to be able to go back and forth, seeing how we got here?

Couple of things:
1. This (my requirement) is not designed to be a back-up of the database as a whole. It's much closer to a transaction log. (And it's at least arguable that storing before and after versions in a transaction log, outside the main "database", would suffice.)
Somewhere I want to draw a line round what's kept and what's not. What constitutes a material change? We probably keep previous versions of persons, families, groups, places and "ships". Not sure about sources - I have a feeling if we don't, then it's a problem. But we wouldn't keep previous to-do lists. Not sure about "proof statements".

2. Part of the driving force here is my total conviction that if we rely on the user to create the next level of the evidence-conclusion tree, then it won't happen in 99.9% of the cases, in which case E&C Model is a waste and the Wiki might as well go onto decoding all 1,500 or whatever it is variations of the EE citation formats.

So - we need to "specify" that we envisage the application doing exactly that (creating the next level) at the "appropriate" time. Automatically.

Problem is - defining what that "appropriate" time actually is. What is material? Any merge of a person with another (be that a real person or a persona for a single source), is clearly a reason to generate the next level. Logically, any change requiring a source citation is the other, as if we just adjust some notes or correct a spelling, how does it matter?

But we know very well what happens out there - 95% of users make a material change first and _then_ paste the citation onto it. If then. Cynic? Me? In this case, how will the app know it's a material change? Not until it's already been done. So the app simply HAS TO apply maximum worry and save it even if it's just correcting a spelling mistake.

No, I'm not sure I like it either.

3. "keep a record of the state of everything (incl other persons etc) before you did that change" Only those that affect this person. Or this person affects. But you're right, through event links, that might be a lot.

4. "If, say, you removed an EP or CP from a tree, because you decide it is a different person, the E&C model had no way to record the previous state of the database" But surely it DOES know the state of the previous persons (or families or groups etc) in that tree - it just looks down below the removed person - they're in their previous state, and then offers you what's above the removed person to say "Are these still valid?"

5. "Also, if you change a date or whatever because it was not recorded correctly, perhaps deep down in the tree, there was no intent to keep the previous date" No, that's true - I'm no great fan of keeping discarded stuff. But how do we distinguish this from a material update - e.g. adding in a new source with a new baptism? Surely we did envisage creating a new level in the tree for that? We certainly would if we'd created a persona for that baptism?

6. "do you envisage changes to the model to fulfil these requirements?" No because it was my attempt to justify why E&C DOES create that tree. What the heck IS it for if we can't see the previous story we used to get here? All alternatives gratefully accepted!

OK - I said a "couple of things" and I'm up to 6, so let's leave it there!

GeneJ 2011-05-06T06:32:25-07:00

Types of "records" you might consider using as examples down the road

Creating a discussion place holder to add different "records" you might consider using for examples at such time that you are closer to explaining the persona process.

AdrianB38 2011-05-07T08:20:56-07:00

Do we need Personas?

Does the Evidence and Conclusion Model (E&CM) need Personas (a.k.a. Level-1 evidence persons)?

This question arises from my attempt at defining the requirements that the E&CM is attempting to fulfil. Let me say straight away that my gut reaction is that E&CM is a good thing. Nevertheless I do find myself wondering about personas.

Definition - Persona here is used in the LDS nFS sense of a record conveying details of one individual from one source. And that's all.

The relevant requirement is drafted in http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38617826
Title: Codification of Information in Sources
Description: BetterGEDCOM could provide the ability to codify the information contained in a source in a machine readable form that could be interpreted by an application program. The stored information should be a faithful equivalent to the relevant contents of the source.

Of the 3 possible ways forward listed on that post, option 1 strikes me as impossible, option 2 is a never-ending task and option 3 is "use of a Person record format to encode the information from a Source". In other words, option 3 is (I THINK) why we have Level-1 Evidence Persons for every source.

If we go down the route of option 3 and have Level-1 Evidence Persons, a.k.a. Personas in the nFS sense, then the act of entering the details of a source is indelibly linked to entering the information from that source into a form that looks like a person's form - this would need to be automatic to be effective - creating a level-1 evidence person, a.k.a. persona.

BUT.... My problem with the persona is the degree of interpretation that's necessary to create it. I can see myself looking back past the persona to the original source record to see exactly what's on the census, what the signature looks like, etc. In which case, is there any point to ALWAYS creating a persona?

Why not just use the source record instead and give up on the "Codification of Information in Sources" requirement?

We would still see trees of persons (one tree per human being if we've sorted it all out). At any point, one would be able to look at the topmost person in the tree and see where the information comes from. It's just that the very bottom of the tree is not a persona, it's a source record.

Comments?

louiskessler 2011-05-13T06:47:24-07:00

Geir,

I have always disagreed with Tom on the need for Evidence persons as I think they are an unnecessary complication that will contribute more confusion than help.

See the midst of my discussion with Tom about this at: http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32726136

Louis

louiskessler 2011-05-13T06:49:31-07:00

Sorry. That was to Adrian, not Geir.

gthorud 2011-05-14T16:11:14-07:00

I have read and re-read a lot of postings. Reading the meta discussion with metaphors about how genealogist work, yet another time, does not help.

In the posting referenced in Louis’s last post there is something I don’t understand, how 7 evidence people is derived from the 5 records, and what the “exponentially increasing” problem is?. Also it is not clear to me how the evidence would be represented – as in Louis’s posting above? (If I have understood that example it would duplicate place and date, so the Evidence record example is probably not complete?)

A major concern seems to be that “codified evidence records” should represent the source accurately, the user should be “protected/prevented” against entering something that is based on interpretation or reasoning, and the evidence records should be able to exist on their own without links to other evidence. So how do we achieve that?

You cannot easily (never) prevent the user from entering interpretations. Even if you, as Adrian suggests, “create evidence records containing non-interpreted, abstract-only data, keyed for searching on these "top 20" characteristics”, you will still have the birth date among those 20, so the user will be able to calculate it from an age. Whatever you call the data records, it does not control the user, but some info in the data (not necessarily in the record) can tell that the record is supposed to represent the data in a source, no interpretation. I have trouble seeing what problem the “top 20” approach solves in this context.

What you can do is to implement things in a program that makes the user aware of the fact that the record should contain evidence only, you can use halve the screen to flash “Evidence” in red when the user enters or views a record that should contain source data only, or preferably do it more discreetly. You can even “lock” the record - preventing updates to it, unless the user presses a button to go into “evidence/source update mode”. If you really want to try to prevent interpretation, you could create templates (forms) tailored to each source type, presenting the user with only the fields in that source type, and have rules for encoding (conversion) of the entered data in events etc. – but I think that is whole separate discussion – and having been involved in the transcription of a large number of sources over the years I am sure that even such a solution will not prevent interpretation 100%. (To be really safe, you could also provide a “copy” function that would copy evidence about a person into a E&C person record – but that is not a good idea for several reasons.)

Once you have some indication in the data that the record should contain “evidence only”, the rest is up to the program and the user in order to ensure that.

So how do we indicate in the data that the record is evidence? A simple way is to link the record to another record holding or referencing the results of a “Lookup” in a source, which could also hold (or link to) free text transcripts of the lookup, multimedia or a summary or etc. The lookup record will link to a source record, a repository record, to a task record, and it will tell you “Where in source”. When a link from a record holding the “codified evidence data” to such a lookup record is present, you can flash “Evidence” on the screen when the “codified evidence record” is viewed. These “codified evidence records” can together with Lookup, Source etc. records be present in a BG-file without links to records providing interpretation/reasoning, thus providing a way to record and distribute evidence only.

But, the same “codified evidence record” can – assuming it is about a person – serve as en evidence person record (the old EP). If it is NOT linked to a superior conclusion person, it will appear as any other person (although “flashed as evidence”). If it is linked to a superior conclusion person it will be part of an E&C-tree as an evidence person, but still “flashed as evidence”. Thus, the record can (but does not have to if it is evidence only) have “superseded” (excluded, demoted) markers attached to the bits and pieces in the record (see separate discussion on this wiki page), the marker being a link to a superior conclusion person where it was decided to hide the info in the subordinate record. The superior conclusion person could possibly hold an interpreted variant of the superseded info, perhaps merged with other evidence. Since the evidence person record type is almost the same as that of a conclusion person, it can also be extended with a pointer (multiple occurrences) to a subordinate person record and a “climbing note” (which in its simplest form can work as a simple note, see other discussion on this wiki page).

What do we call the record type representing the evidence for a person and possibly at the same time an EP, and which can also be used for CPs? The simple answer is “Person record”.

There is nothing preventing us from including advice/suggestions/rules to programmers in a BG specification how they could present evidence persons to users, trying to prevent interpretations/reasoning if we have only one person record – as discussed above (e.g. “flashing”). I don’t see a need for having a dedicated evidence record (or whatever) that will constrain what the user can do. Also, because a person record type as described above may contain information that is only used when the record is linked to superior and/or subordinate persons in an E&C tree, there must be rules stated about when and how to record/use this information – again, no need for special record types.

Finally, and very important, one of the most important features for recording a reasoning is citations. Citations will go into (or rather linked to) a lot of places, in events, in notes and can also be attached to a role etc. A citation must be able to present the result from a lookup as e.g. a summary or extract so it will have to link to a lookup record and through that to the codified evidence record, e.g. the person record mentioned above – so you can get at the evidence that way also. (I believe this what Louis want in his posting referred to above).

What I have written above may be too complex to understand, but draw some records on paper as you read through it. I will try to post a more complete record structure tomorrow.

If we want to present the evidence in a structured form that looks more like the evidence in some sources, for example like in a census, and be searchable, one way to do it is to create a “meta database” for storage of table structured information as I have mentioned in Req Cat Data09. There you can have some of Adrian’s top-20 characteristics as columns, selected to fit the source type. But, this has to come in addition to the information encoded in events etc.

gthorud 2011-05-15T09:27:51-07:00

See my last posting above.

I have created a sketch of how parts of some records could look like. There are some additional ideas in the file as well.

The E&C functionality is tied into records for recording evidence and to citations as well, so i put everything in one document.

Geir -- Person - Extract - Lookup - Task - Citation etc Records.pdf

gthorud 2011-05-15T09:29:10-07:00

New attempt

Geir -- Person - Extract - Lookup - Task - Citation etc Records.pdf

gthorud 2011-05-15T09:31:36-07:00

There is a diagram at the end of the document

louiskessler 2011-05-15T09:44:19-07:00

Geir,

The exponential growth simply refers to the number of possible ways you can combine similar people into evidence people. E01 Johann Kessler, E02 Yohan Kesler and E03 Johann Kesler might be one person be seen to be 3 different people (that's 3 EPs), by someone else might E01 and E02 are the same, or E02 and E03 are the same, or E01 and E03 are the same (that's 3 more) and someone else might think E01, E02 and E03 are all the same. That's 7 different ways of combining (or not combining) evidence into EPs, and it depends on interpretation. That's why I don't like it. They are NOT Evidence People any more. They are conclusion people because some interpretation was applied. (4 items of Evidence lead to 15 possible EPs, 5 lead to 31 EPs, etc, which is exponential)

And yes, evidence should have NO interpretation. I don't want to see other people's interpretation in the evidence. If I want to see that, I'll ask for their family tree data which presumably should include their conclusions and reasoning and links to the evidence they used to support that.

You said: "What do we call the record type representing the evidence for a person and possibly at the same time an EP, and which can also be used for CPs? The simple answer is “Person record”."

No. There is no person record. The source record is simply evidence. The people named in the record/evidence are simply names. No one knows who they are yet.

It is in your own genealogy that you identify people (conclusion people) and decides which items of evidence belong to them.

I completely want to get rid of that middle "evidence person" concept. They are not evidence people. They are conclusions. If you believe 2 items of evidence represent one person and the 3rd item represents another, then you make 2 conclusion persons. But don't think that someone else would necessarily assign them the same way.

Here's the point. If evidence people could be assigned unambiguously and accurately without question, then I'd say great, let's use them. But to add a layer of guesses in between your conclusions and the factual evidence is a horrible unneeded complexity.

Now to say that each person mentioned in each item of evidence is an "evidence person" adds nothing. At the level of evidence, that is just a name, that might be spelled wrong or even inaccurate. Why try to personify a name? All that name is for is to search through to try to identify pieces of evidence that may support your conclusions. Maybe the spelling of your conclusion person's name is one of those things you're trying to pin down.

And don't mix up the meaning of a citation. I had this wrong and Tom corrected me. The citation is simply the formal way you refer to the evidence. You link your conclusions to the pieces of evidence. Each piece of evidence comes from one source.

A source will be a Census roll.
A piece of evidence will be one record on that roll.
The citation would be how you would identify the Census roll and record.

To record a reasoning, you refer (or link) to the evidence. Citations are not linked to them. Citations are simply derived from the source description and the evidence description and follow some formal pattern like Mills or Lackey. But those descriptions parts can be included in the Source record and in the Evidence records.

I am sure you have as much trouble mapping my ideas to your thinking as I have mapping your ideas to my thinking.

Personally, I think the way to go is that its already time to start developing a formal data model that extends GEDCOM as needed to incorporate these ideas.

I could do it for my concept of INDIs (conclusion persons), SOUR (source records) and EVID (evidence records), but someone else would have to do it for what they would call the evidence person model.

Then and only then would we be able to test all the abstract concerns that people have been having to see if the model can handle them, or maybe to find out if the concerns were actually of no concern at all.

Then we'll be getting somewhere. Anyone want to start getting somewhere?

p.s. Sorry I can't attend the Monday developer meetings very often, but I only can when I'm off work. Silly thing these day jobs are.

Louis

louiskessler 2011-05-15T09:53:31-07:00

Geir,

Our posts overlapped, but I really like your initial model attempt.

Start a new thread about it (or maybe even something in the data model sections) and everyone can go at it and hammer it out.

Can you put it into some sharable form that we can all mark up with our comments? PDF is not conducive to that.

Despite all the detail you have there, you'll find that I'm going to argue that it is not that much different that current GEDCOM and really only a few extensions to GEDCOM might be needed to address all the concerns of BetterGEDCOM.

Louis

gthorud 2011-05-15T17:10:38-07:00

Louis,

I will start a new discussion of the most relevant issues in the document, but I have some other business to attend to for a day or two. I need to sort of have a clean desktop before we start.

While you are waiting - I assume you have nothing else to do - here are some permutations/instances of the most important records:

E&C with Evidence - permutations.pdf

Are you suggesting that we should start passing marked up documents around, rather than using the discussion facilities on the wiki?

testuser42 2011-05-17T14:43:14-07:00

I find myself mostly agreeing with everyone all at once and one after another. That must mean you're all right :) Or it means I don't see that the differences are that big...
That said, some of my thoughts:

Louis, you said "That's 7 different ways of combining (or not combining) evidence into EPs"
Actually, I always thought that an EP is defined as a Person Record with only one "Source" for the whole of it. That Source would be the single bit of evidence. So EPs would never be combined from more than one piece of evidence.
As soon as a Person Record has more "input" than just that, it turns into an "CP" by definition.
Both EP and CP are just Person Records, EP and CP are just names given for the purpose they serve. EP being the first level of a possible tree.

But when you say
"Now to say that each person mentioned in each item of evidence is an "evidence person" adds nothing. At the level of evidence, that is just a name, that might be spelled wrong or even inaccurate. Why try to personify a name?"
you are right.
Any codified "Person" already is based on an assumption. So in that sense, there is no true "EP". That's been bothering me too, but I've never thought much about it. Strictly speaking, there will always be some interpretation happening: deciphering handwriting etc., but that's unavoidable.

So, I really like the "Evidence Record" you are describing. It's basically what "Codifying Source Info" should be, a kind of smart extract of the evidence from a source.
I always thought something like this would be part of the Source record structure, but having an additional "Evidence" record feels better semantically. It neatly keeps all the info from one source together, and it keeps all the Persons as conclusions - no confusion possible.

Could there still be a need to "Personify" the bits seperately? As an option? If there's only one piece of evidence that mentions someone, you'll never combine this into a (conclusion) Person. So that data (Name, Age, Role in Event...) would never be in a "Person" Record, only in the "Evidence" Record.
Could that be a problem (eg for searching)?
Or would that be a good thing actually (since it doesn't make more than necessary out of a passing mention)?

louiskessler 2011-05-17T21:19:48-07:00

Hello mysterious testuser (whoever you may be, but obviously a Hitchhiker fan),

Thanks for the excellent observations.

The way Tom Wetmore was explaining his EPs, he had multiple levels of them and complexities that led to me concluding there were different ways to create them. I'm glad you don't have that concept and have the simple 1-1 idea that I agree with.

Once we have 1-1, then I thought, why have the person at all, when they are embedded in the evidence itself. It is just a duplication.

So when you look at Geir's permutation diagram: http://bettergedcom.wikispaces.com/file/view/E%26C+with+Evidence+-+permutations.pdf
- you'll see how horrible the use of multiple levels of CPs and EPs can become. The model I like best with the Evidence Record is the one called "1 level b".

To "Personify" the bits, I would create new CPs for them. If there is some person who may be someone in my tree or may not, I create a new INDI and attach all the info to it. These people are not connected to the rest of my tree, but I've got it stored for when I need it. Maybe one day I'll find they're the same and then I'll merge them.

So I see no need whatsoever for an Evidence Person. To me it only adds complication.

Louis

ACProctor 2012-01-25T04:29:47-08:00

I am just linking another relevant conversation to this discussion since Tom and I have unfortunately raised parts of the same subject elsewhere:

http://bettergedcom.wikispaces.com/message/view/Data/48419278?o=20#49586040

Issues of interest there, other than a comparison of Personas and STEMMA's equivalent, include a brief comparison of the Persona concept versus multiple instances of a Person. An example of the latter was given where there is evidence of two people physically existing (neither are pure "evidence Persons") but it unproven whether they are the same Person or not.

I find the whole subject very interesting and I would dearly love someone to write an objective summary of the the pros & cons of the different approaches.

Newer people to BG - like myself - and vendors will need such reference material to help the decision-making process. :-)

Tony

AdrianB38 2011-05-10T08:55:40-07:00

"The implications of not creating an EP is that you can not unlink the information by unlinking the EP". Indeed. You unlink the source from whichever person you've decided is wrong; then manually update or delete the data on the person that it's pointing to. Or the person it was pointing to. (So that's probably the _wrong_ order I just typed...)

It's not as clean, there's no doubt about it, as the removal has to be manually done. But given that removal is not a thought-free process even WITH the persona (because you need to think about what might have relied on the data you're removing) I'm not sure that the practical difference is much.

And like I said, if I've loaded my GEDCOM data into my BG compatible database, some 2,900 sources don't have personas, so I have to have code in the app to deal with that.

AdrianB38 2011-05-10T11:18:12-07:00

FYI - these are Tom Wetmore's thoughts on the question of "Do we need Personas?":-

I [Tom W] have asked "What do you want to do with your evidence?" Whether you believe there should be persona records or not seems to come from how you answer that question.

There seem to be three general answers. Let's say all three start with "Add a source record for where the evidence information comes from to your database, and maybe add an image file with that information to your computer."

1. Add the information to a person record in your database and link that information back to the source record. That is, immediately convert the evidence information into the form needed by person records and add it to them. The information is not recorded in any other form except possibly as an image file or a Xerox in your paper files.

2. First add the important part of the information to the source record. The source record might have a standard way to help you do this, or you might have to use a more general note approach. When you decide which person the evidence belongs to, make the person's person record point back to this part of the source record and/or copy the information into the person record. The part of the source holding the evidence makes up part of the citations.

3. First add the important part of the information to your database by transcribing/coding it into standardized records designed for that purpose. These records refer to the source records so you keep the source link. When you decide which person the information belongs to, make the person records refer back to these records; don't copy any information, just add a reference. These records have been called evidence records, and those that hold information about people are the persona records of this thread topic.

Option 1 is often used by people just starting out. When they find new information they decide whether it belongs to a person they are interested in or not. If it does they add it directly to the right person record; if it doesn't they ignore it. There is no need for an alternate location to hold the evidence.

Option 2 is also used by people researching persons they already know, but is preferred by genealogists following good practices and generating high-quality citations. They add the information in two places, first to the source record that describes where the information came from, in the form needed to get their citations; and second in their person records, in the form needed by person records.

Option 3 is useful for people researching further back in time, when they don't yet know the persons they are looking for. They are doing true research, that is, they are collecting evidence that will require much deduction and inference to decide what it means -- they have no a priori way of knowing what persons the evidence belongs to -- the purpose of their research is to decide this. They need to have their evidence information in a standard form that is easily searchable because they must be able to quickly retrieve all their evidence about people with certain name subsets; that have similar properties such as date and place of events; names of parents, spouses or children; places of residence and so forth.

AdrianB38 2011-05-10T11:40:33-07:00

A bit of philosophy as well - let me (selectively) quote from the Ancestry Insider's blog on "The Chasm" (http://ancestryinsider.blogspot.com/2011/03/chasm.html extracted 10 May 2011):

Beyond what the AI refers to as "The Chasm", beyond the point when a continuous stream of vital records breaks up, he says: "Genealogists search for records, not people. The people we reconstruct may or may not look like the individuals they are meant to be. If we carelessly frankenstein several individuals together, the result may bear no resemblance to anyone. In essence — and I know I'm going to brew some disagreement when I say this — in essence, we are no longer searching for ancestors; we are searching for records. Okay, maybe that is overstating it."

That's just a quote from the middle.

I commented on the last phrases above, when he said that "in essence, we are no longer searching for ancestors; we are searching for records."

What I said was "No it's not overstating it - I remember in my final year at school, my history teacher summarised history as 'Not the study of events, but the study of records'. At the time I thought he was overstating it - until I did maths at uni and began to learn the importance of understanding the basic principles that you probably didn't think about before."

I think you need to appreciate this philosophy to understand where I'm coming from in some of the questions about what we do with "evidence". You may not agree with my proposals but I think you need to appreciate that:

Fact: Genealogy and Family History are not the study of relatives, but the study of records.

Now, I've seen people disagree with this - they say they don't want to study records, they want to study people. Sorry - but unless you can hitch a lift on Dr Who's TARDIS or on the USS Enterprise when they discover yet another means to time-travel that only works for one episode, then you simply _cannot_ study your deceased relatives. You can only study records - which you don't even know are theirs until after some analysis.

If you want to make the definition less over-stated, then you could (and perhaps should) say:

"Genealogy and Family History are not the study of relatives, but the study of records - in order to understand the relatives."

So, this philosophy lies behind my desire to do something more with records, data, information, evidence, whatever. Being a mathematician I want to formalise it, otherwise why don't we just use a Word Processor? This philosophical meander is not here to justify personas, nor to justify not having them. It _is_ here to say - I want to do something more robust and powerful with records and the information (not necessarily evidence) within.

louiskessler 2011-05-10T20:07:33-07:00

Adrian said:

"Fact: Genealogy and Family History are not the study of relatives, but the study of records."

Genealogy and Family History are different things to different people. In fact, so are the concepts of Evidence, Citations, Conclusions, Personas, Conclusion people, ...

Personally, all this abstract thinking and talking makes my head hurt.

Everyone thinks differently and I have trouble mapping everyone's ideas onto my own - and as you can see by the many conversations here at BetterGEDCOM, meta discussions like these can go on forever.

I'd like to get down to brass tacks. I'd like to see what's wrong in GEDCOM fixed. I'd like to see a simple solution that has enough benefits, and be an easy enough thing for genealogy software programmers to adopt, that they will.

I want to see an Evidence record and a Place record. These two things will extend GEDCOM so that Evidence and Conclusion models, and proper citation transfers - the two biggest complaints about GEDCOM - can be incorporated.

Let's not try to introduce new abstract ideas like personas and evidence persons and multiple layers and whatever. They are for the theorists and people who want to talk about meta-genealogy. People who just want to do their genealogy don't want them, and they don't want their software programs to complicate their life with them.

Louis

gthorud 2011-05-11T12:08:26-07:00

First corrections:
In my first posting above http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38695952#38781576, first paragraph, second last word: EP should be CP (there is a big difference).

Also my statement about a name in a CP overriding the one in the EP, is wrong, because there can be multiple names. (If you want to exclude the EP’s name you have to mark it as excluded with a link to the level you where you decide to exclude it – as has been discussed in another discussion on this page.)

I have started to go through this discussion from the beginning.

Adrian wrote in the first posting: “BUT.... My problem with the persona is the degree of interpretation that's necessary to create it. I can see myself looking back past the persona to the original source record to see exactly what's on the census, what the signature looks like, etc. In which case, is there any point to ALWAYS creating a persona?”
I am not sure if I understand this in the context “Do we need personas?” At some stage you have to start codifying, either in an EP or in a CP. Even if that info is stored elsewhere it has to be codified and linked into the context of an EP or CP. I do not understand why there is a difference between an EP and CP, and why this is an argument to get rid of the EP. You cannot give up the codification if you want it stored in events or other places outside a note.

In this posting http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38695952#38818406

Adrian wrote: “It's not as clean, there's no doubt about it, as the removal has to be manually done. But given that removal is not a thought-free process even WITH the persona (because you need to think about what might have relied on the data you're removing) I'm not sure that the practical difference is much.”

I agree that if you remove an EP, this may require changes other than removal of the info held in that EP. But if you use that argument for removal of EPs, it also applies to all CPs except the one at the top, so it is essentially an argument to have only one level.

There are several dimensions to this discussion

- Where do you store the info – in the context of an E&C person structure or in the context of the source (I use context to indicate that the info need not be store IN the person or source record) or is the info stored outside the program on a web server or in a different BG-file

- In what format is the information stored in the program – in events etc. (as in e.g. a person), unstructured text, image or some other structured form other than events etc. (e.g. a table).

- Is the context of information preserved (e.g. a person in the context of a household in a census)

- Is the info codified (interpreted) in events etc. or is it placed in a context that implies interpretation

I will not go through all of these now, but I am trying to summarize some issues. I will just look at what I feel is the key issue.

At some stage you have to codify the info from the source in events etc. suitable for a person. I think it might be worth to see if it is possible to store this info “BETWEEN” the E&C person (and place etc.) structure and the source (and citation) structure. If you view this info from the person it will look like an EP (it is linked to from the E&C structures) and if you see it from the source side it will contain the info from somewhere in the source (it is linked to from the source&citation structures). It may contain some info only relevant to one of these views, and some instances may be linked to only one of the two views. A key to make this working will be that the info can be grouped per person in the source – the EP/persona (maybe in addition to other structuring required for recording the source info).

It will be necessary with additional links directly from the E&C structure to the source&citation structures (to handle current Gedcom, citation in notes, etc.), and it will be necessary with links from task/objective records, and to records possibly holding the source info in other forms.

Although it is a detail in this context, I think it is necessary with separate records for events, which will be part of the info stored “between”.

Additional work will be required to handle the other ways to record source info.

I have skiped the meta discussion.

gthorud 2011-05-11T15:43:04-07:00

Just wanted to mention that Mike has written something related here http://bettergedcom.wikispaces.com/BetterGEDCOM+Attempt

AdrianB38 2011-05-12T13:24:39-07:00

I wrote in the first posting: "BUT.... My problem with the persona is the degree of interpretation that's necessary to create it. I can see myself looking back past the persona to the original source record to see exactly what's on the census, what the signature looks like, etc. In which case, is there any point to ALWAYS creating a persona?"

Geir responded "I am not sure if I understand this in the context 'Do we need personas?' At some stage you have to start codifying, either in an EP or in a CP. Even if that info is stored elsewhere it has to be codified and linked into the context of an EP or CP. I do not understand why there is a difference between an EP and CP, and why this is an argument to get rid of the EP. You cannot give up the codification if you want it stored in events or other places outside a note."

Codifying is not the issue - as you say, we have to start doing it somewhere. Rather I became concerned that the Level-1 Evidence Person (a.k.a. Persona in nFS terms) might bring its own issues. In which case, are those issues strong enough to challenge its existence?

If I need - as I do - to check on someone's date of birth from a series of censuses, the truth is that it makes more sense to go back to the original date and see the quoted ages. Then check the images to see what was really there. An evidence person on level1 with a birth date of "Btw April 1845 and April 1846" doesn't give me the clue that the figure I thought was a "5" was actually a "3". So - what use was the evidence person? Why not just plug the source record straight into the conclusion person instead of the evidence person?

OK - It's perhaps less of an argument for getting rid of the persona and more a _prompt_ for asking what use a persona is? >>>>What requirement is a persona trying to fulfil?<<<<

I do think we need to put something to sit between a source and a lowest level conclusion person. But maybe it's not a persona record, but an evidence record (whatever that is but I think Louis has suggestions!) And suddenly I'm back with my suggestion "Reqts for E&C 1 - Codifying Source Info" http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38617826

But I'd pretty much convinced myself there that full codification of the source was impossible except through a persona format. And I stick by that. But what if we only encode a strategic number of items - like name, place, age, birthplace....? Enough to let the app list and group similar sources. The rest of it, we just represent with free format text. (Free format text is anathema to anyone who wants to ensure full control. But if full control is not possible?)

This would be a variation on "Reqts for E&C 1 - Codifying Source Info" since this does not mandate full codification.

Geir - is this last suggestion similar to your suggestion?

gthorud 2011-05-12T17:00:15-07:00

I think we are moving in the same direction. I have drawn up a large diagram with various records, I would like to work a little bit on that and get back with a record structure.

I note one thing about your sencus example in context of the BETWEEN structure I suggested above. It can will be constrained to containing the data from the source, so for your sencus it will only contain the ages in a census event, and birth place in a birth event. If you want to interpret the age and record a date in birth event you will have to do that in a CP. This is different from how I understood an EP would work - it could contain both events and a date in the birth event. You may already have tried to tell me that ....

louiskessler 2011-05-12T18:18:51-07:00

But Evidence Persons are completely wrong. They are conclusions. Evidence should not contain assumptions.

Adrian said:
"If I need - as I do - to check on someone's date of birth from a series of censuses, the truth is that it makes more sense to go back to the original date and see the quoted ages. Then check the images to see what was really there. An evidence person on level1 with a birth date of "Btw April 1845 and April 1846" doesn't give me the clue that the figure I thought was a "5" was actually a "3"."

Yes. This is totally correct.

"So - what use was the evidence person? Why not just plug the source record straight into the conclusion person instead of the evidence person?"

Keep the information from the source as an Evidence record. Do NOT plug this information into your conclusion person. What you put into your conclusion person are (obviously) your conclusion information and reasoning with the link to the evidence that supports it.

AdrianB38 2011-05-13T04:12:40-07:00

Geir - sorry to be a pain, but when you say "This is different from how I understood an EP would work - it could contain both events and a date in the birth event", which type of record can contain both events (presumably you mean both a birth and a census event?) and a date in the birth event? Do you mean a CP could contain birth events and census events, both dated?

And are you saying that your understanding of an EP was that it would only contain an age (at the time of the census), not a birth date? Which is interesting as I'd never had that thought. Though I can see why you suggest it (if I understand you correctly) - adding in the birth event is a matter of interpretation.

AdrianB38 2011-05-13T04:15:23-07:00

These are Tom's thoughts (added with his permission):

Genealogists, whether they accept it or not, are historians. Historians deal in records. For historians to make their conclusions about what happened in the past, they have to gather their records together, take notes on those records, organize those notes in ways that allow them to quickly access the information in the records, allow them to quickly correlate facts from different records, allow them to quickly form and reject hypotheses, and allow them to write up their results in a manner in which all their conclusions are supported by properly formatted citations.

How do they do this? They don't pile their desks with copies of all their physical records. They take notes. In the old days they filled thousands of 3x5 cards with facts extracted from all their records. They did their work by organizing those cards, pulling together cards taken from different sources to substantiate facts or hypotheses. The final organization of their cards often defined the structure of their final works in which they expressed their results and conclusions.

A persona is nothing other than a computer-age analog of a 3x5 researcher's note card, holding all the facts about a person taken from a single physical record. With this analog the genealogist (historian) doesn't have to keep referring back to the original sources, which is a very inconvenient thing to have to do.

The genealogist can do with the set of persona records exactly what a historian does with their 3x5 cards, except the computer can help the genealogist by sorting, indexing, arranging, displaying, those personas in ways far more powerfully than anything the old historians were able to do.

Personas summarize facts in a consistent, easy to use, form that makes research and reasoning and conclusion forming much easier than having to refer to the real records. That is the requirement they fulfill. This is how historians have worked for centuries. All personas do is put some of the time-tested practices of historians into the hands of genealogists.

Tom Wetmore

AdrianB38 2011-05-13T04:57:09-07:00

For what it's worth - my current thinking goes like this:
1. I make no promises that I'll agree with myself tomorrow.

2. I agree with the _requirement_ expressed in Tom's comments above. I use the equivalent of the researcher's card except that in my case it might be a line in a spreadsheet, or a box in whatever charting tool I'm using in place of Visio at the moment.

3. The equivalent of the card-shuffling technique is both valuable and possible for me because the coverage of the original English parish registers is very good. Yes, there are exceptions, but many of the exceptions are consistent exceptions and can be detected. In particular, while one can fail to be baptised, never get married, it's quite hard to avoid dying and consequently getting buried - in a churchyard.

4. Where I feel might be diverging from Tom is in the idea that the solution to the requirement for a card equivalent is necessarily a persona that is the same - in data modelling terms - as a normal person record. This is because of the amount of interpretation that MIGHT go into a creating a person record for the evidence.

5. The classic example of interpretation (for me) is the question of age on a census. (When I check GEDCOM 5.5 it does allow AGE_AT_EVENT to be held against any event). I am coming round now to the view (which you the reader may have already held) that when creating the card equivalent, one should only put on it the age - because that's what's on the source - and not the implied birth date / date-range.

6. Technically, there are 2 ways of designing the card equivalent:
- an entity that is the same - in data modelling terms - as a normal person record (or a family or a group or a place...). The danger here is people putting interpreted data into the values;
- an entity that is a much stripped down version of the normal person record (or a family or a group or a place...) with scope for the top 20 or so "things" on which to search. These "things" are direct quotes / copies / abstracts from the text of the source. The rest of the "evidence" (or "information", which ever it is) is written as free-format text. The argument against this approach is that free-format text cannot be controlled and is just as subject to dangers of interpretation. Plus you can only search on these top 20 things unless you do a full text search.

Call option 1 personas if you like and option 2 evidence records, but frankly I can't get excited about a name. It's the degree of interpretation that goes on that concerns me.

7. Without battering my head much further, it does seem to me that these approaches CAN be combined.
(a) create evidence records containing non-interpreted, abstract-only data, keyed for searching on these "top 20" characteristics;
(b) IF you are attracted by full codification, go on from your evidence record to create a "proper" person, containing just the information from the evidence record. But it's your decision, not one that is done automatically for you in the software.

Either way, I am assuming that the tree of evidence and conclusion persons still exists. In the tree, the evidence persons can be at any level (i.e. they may be combinations of data at a previous level) - in effect a conclusion person is the outcome of an iteration of the genealogical research process; the input to that process is the evidence from the sources plus the evidence from the previous state of knowledge about those people (i.e. the evidence people).

AdrianB38 2011-05-07T08:57:24-07:00

If we remove the requirement to create a persona for each person mentioned in each source, then the typical input of data might look like this:

Remember, behind the scenes, we MUST still have a hierarchy of persons. Remember also that the application renders each tree visible to the user as one person under normal updating.

Case 1 - enter a new person from a source - say it's a baptism of a child. Let's just ignore the fact that this person may cause updates to their parents.

1. Create a source-record for the baptism (as conventional GEDCOM).
2. Work whatever magic is necessary in your program to concoct the citations for the upcoming facts.
3. Create a new person in the database.
4. Add the baptism details to that person.

Result: 1 new source record in the database, 1 new person record with their facts pointing to the new source record.

Case 2 - update an existing person from a source - say this time it's a burial for the same child.

1. Create a source-record for the burial (as conventional GEDCOM).
2. Work whatever magic is necessary in your program to concoct the citations for the upcoming facts.
3. Bring up the current details for the child in the appropriate form.
4. Add the burial details to that child's form - plus an estimate for date of death if you do that, automatically linking to the source record in question.
5. Hit the "OK" button to write the details back to the database.
6. Behind the scenes, the program does NOT update the existing person. It creates a new person with a burial event pointing to the burial-source-record (plus death event if you did). It also creates a link between the old-child and the new-child so that "conclusions" held against the old-child can bubble up to the new-child and the old-child is effectively superseded as a standalone record.

The difference from the conventional E&CM is only that the link from the burial source record goes into the level2 new-child, rather than a new level1 persona containing only the burial details.

Case 3 - review the current details for the child on screen or in a report:
1. Select the level2 child (you cannot see the level1 child as it is superseded)
2. Select the report option (say) - the application pulls out the level2 child's details, holding them (the burial and death?) in working storage; it then finds any lower level persons that this level2 has superseded (there's just one) and reads off their details (insert various bits of logic to ensure clashing data is handled - there isn't any here). Thus the report now comprises (in the correct order because the app is clever like that) - name and baptism details from level1 child, burial (and death?) details from level2 child. And prints them.

AdrianB38 2011-05-07T09:11:23-07:00

The further extensions to the paradigm above are that
- we MIGHT decide for ourselves that in order to carry out a one family one area reconstruction, then we DO create a persona for each source-record - but that would need to be done manually by creating an individual record ourselves. We would then, once we have a sufficient degree of proof, "merge" these "persona" records - see below.

- we might have been inputting data for a person for some time, and also inputting data for another person for some time. Suddenly, we realise these are the same person. We wish to "merge" these 2 persons. Say one is a level10 person, the other a level7 person.

We select the 2 in the application and press the button that says merge. We may want to enter the proof summary why these 2 are the same person after all, or enter a source that shows it is so.

Behind the scenes, the app creates a level11 person. The level10 person is superseded and pointed to the level11 person. The level7 person is superseded and pointed to the level11 person. The level11 person is pointed to either the source record showing the 2 are the same or the proof summary showing why these 2 are the same if you keep these separate.
The level11 person is given any values necessary to over-write clashes between attributes of the level10 and level7 persons - as discussed elsewhere on this page. Otherwise, when printing or showing stuff on screen, it collects its data from the level10 and level7, which collect their data from the level9 and level6 persons (however many there are), which ....

AdrianB38 2011-05-07T13:06:10-07:00

It has just occurred to me that any GEDCOM file loaded into a BG-compatible program will have a tree of conclusion people with sources hanging below them rather than personas, so I _think_ the above idea (making personas optional) doesn't bring any issues for data models.

For the avoidance of doubt, there are many circumstances where I would, myself, generate personas.

louiskessler 2011-05-07T18:15:49-07:00

Just to put my 2 cents worth in, I do not like or want to see Personas at all.

I feel a person is a conclusion, and is for someone's own genealogy information and that is all.

In source data, noone should be trying to identify "persons", because it is a conclusion they are drawing, that the source is referring to someone in particular. If you get 5 different people looking at a source, they may make 5 different conclusions.

So I would like to see sources and evidence as records of their own. Doing so will allow a repository to create BetterGEDCOMs of just the material they have, without jumping to any conclusions or needing to include any "personas".

Within the citation data, you should simply have the person's name and other identifying information. But no attempt to personify him.

That's my (rather strong) feelings about this. That's also how I'm implementing citations in my software.

Feel free to take this any way you wish. I expect many people will disagree with me, but I see it as a simple and powerful idea.

Louis

louiskessler 2011-05-07T18:19:16-07:00

p.s. Where I said "citations" in my last post, please read "evidence".

I mix them up a bit because I'll be using the "citation" holder in GEDCOM to be my evidence holder.

Louis

AdrianB38 2011-05-08T05:06:33-07:00

Louis - just me trying to make sure I understand... By mentioning "evidence", I take it that you are advocating an attempt to do _something_ better than a GEDCOM based app does with sources and evidence at the moment?

louiskessler 2011-05-08T10:05:46-07:00

Exactly.

What GEDCOM calls SOURCE_CITATION, I call evidence. Tom and I hacked that out a few months ago somewhere here on BetterGEDCOM.

The problem with the SOURCE_CITATION in GEDCOM is that it is attached to the Source reference that is attached to the Event that is attached to the individual, e.g.:

0 @I1@ INDI
1 BIRT
2 SOUR @S1@
3 PAGE 43
3 EVEN BIRT
4 ROLE MOTH
3 DATA
4 DATE date (as on the record)
4 TEXT This is the text of the evidence
3 OBJE Here are links to the scanned image etc
3 QUAY 3 (personal evaluation of the evidence quality)
3 NOTE Here is where I would put my conclusions.

GEDCOM has mixed up the Evidence with the Conclusions. I think it is essential that evidence be its own record, like this:

0 @I1@ INDI
1 BIRT
2 EVID @E1@
3 QUAY 3 (personal evaluation of the evidence quality)
3 NOTE Here is where I would put my conclusions.

0 @E1@ EVID
1 SOUR @S1@
2 PAGE 43
1 EVEN BIRT
2 ROLE MOTH
1 DATA
2 DATE date (as on the record)
2 NAME first /Last/ (as on the record)
2 PLAC place (as on the record)
2 TEXT This is the text of the evidence
2 OBJE Here is a link to the scanned image

Note that this is all currently valid in GEDCOM, except for making the EVID a record, and for allowing the NAME and PLAC tags under the DATA tag, which I feel is essential.

So what I see is a repository (your library, public records, etc.) can have volunteers making GEDCOM files that are full of SOUR and EVID records only, that describe every item of evidence that they have. There will be no conclusions at all in this data. "Just the facts, ma'am". This would allow volunteers to transcribe this material mechanically without interpretation and create a resource (hopefully online) that you or I or anyone could search to try to see if there is some evidence in some source at that repository about our ancestor. Our search would involve searching through the names, events, dates, places and even the text in the evidence to narrow down possible connections.

Doesn't that seem logical?

gthorud 2011-05-09T15:22:57-07:00

I am trying to see the implications of this for the data model - are there any? I guess everyone agree that it should be possible to store data from the source – not codified - linked but separate from the E&C persons. But if you want some of that info to output you will have to encode it in a note, event, citation or something else that outputs, or are we creating a new way to output source data? As Adrian has observed, you will have to be able to import data from a non-E&C-program, so you can not always expect source data to be attached to an EP only.

The implications of not creating an EP is that you can not unlink the information by unlinking the EP. Also, you need the EP to be able to record the name used in that source – if you record it in the CP, it would override that in the superseded EP previously recorded in Adrian’s example. (We have not really discussed much about the handling of names in E&C.)

Re. Evidence in a separate record. Maybe Louis could point us to the previous discussion of that. Mike has also suggested something called “Extractions”. I was directed by todays Developer meeting to create a requirement for what I have called “Collections of source data” – Data09 - the next version of an unnamed program is likely to have features based on a similar idea. BUT, how is this related to the creation of an EP – or not? I tend to see it as a different issue?

gthorud 2011-05-09T17:34:30-07:00

Well, I see that Louis's Evidence record is related to the discussion - I will have to think about this.

louiskessler 2011-05-09T19:14:42-07:00

Geir:

No, I don't want to see EPs at all. To me, its an unneeded complication.

For some more of my thinking on this please read my blog post from January 16th: http://www.beholdgenealogy.com/blog/?p=805

After reading that, then go to the previous discussion on BetterGEDCOM which was 4 pages of discussion ending with: http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=60#32733704

Louis

louiskessler 2011-05-09T19:25:07-07:00

Geir said:

"The implications of not creating an EP is that you can not unlink the information by unlinking the EP."

We can create an evidence record, and not an evidence person.

"Also, you need the EP to be able to record the name used in that source – if you record it in the CP, it would override that in the superseded EP previously recorded"

No. I see searching an online database of evidence and finding, say 6 interesting items. You download those and add them (hopefully automatically via your software) to your data. They are simply evidence records.

Now you link the event in your conclusion person to the evidence, and make a note about what you think that piece of evidence adds or doesn't add to your information. If you think the information supercedes what you have, then you update that information yourself in your conclusion person.

The other *important* thing is that *you* update the information *yourself*. This should *not* be an automated process that is done for you. Your conclusions are your own and you need to think about each piece of evidence you come across and what you want to make of it.

Louis

gthorud 2011-05-20T13:45:36-07:00

Geir's working document

I have uploaded version 0.000002 of my document.

E&C and Extract and Task and Citation etc Records v.002.pdf

I had planed to make a changebared version, but endid up adding so much text that it is better to not do that. I have changed very little of what was in version 1.

Unfortunately i could not resist to put in stuff not strictly related to E&C, they should perhaps be left alone at the moment - and then we can find places to discuss them.

I am not sure how to discuss the document, but I suggest the following since I do not want a marathon discussion of all things in the document.

We should try to discuss only the overall record structure and relations, and things related to E&C - Evidence - Citations on this page.

Existing discussions on an aspect should be continued in old topics when the subject is appropriate, but refer to the doc in the first new posting.

New topics should be started on new issues, as small as possible - if possible.

As I see the multi level variant of E&C as something that we should at least develop a solution for, and because it is optional, I would prefer not to discuss if we should develop it for the 100'd time, but rather how it should work together with Lookups/Edvidence and Citations etc.

gthorud 2011-05-20T19:58:58-07:00

Something missing in the document is handling of links to citation records when merging person records into a 1 or 2 level structure from a multilevel structure, when importing a file.

There was at some stage a discussion about which parts (eg events) of a (old) person record that a source/citation reference for the person would apply to - eg. did it apply to all the content in the record or maybe only parts (not necessarily all events) etc. Anyone that remember where this discussion took place? I am sure that at least Louis, Adrian and Tom participated.

louiskessler 2011-05-20T22:00:12-07:00

Geir,

It might be in this thread starting from this message:
http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=20#32575472

gthorud 2011-05-22T08:07:59-07:00

Thanks, Louis.

Unfortunately it did not solve the problem, but we can take that in a separate discussion.

ttwetmore 2011-05-29T12:58:18-07:00

Geir,

I just uploaded a file with comments.

GeirComments.pdf

Tom

gthorud 2011-05-30T11:15:23-07:00

Tom,

I will try to break up my resopce into several discussions.

First some clarifications on the last diagram in my document.

1 level a

I should perhaps have found another term than EP (the only one I came up with was Extraction person, but that would also be abbreviated EP), because it is not an EP in a multilevel hierarchy, but it contains the same data as an EP in a hierarchy would. The EP in this diagram is holding the codified source data in records that are not pointed to directly from a CP. Assume it represents a person in a Gedcom codified transcription of a church book that I have imported.

2 level (Gentech)

It is assumed that Gentech just links Personas. Since Gentech seems to have been implemented by nFS, this is an attempt to see how nFS could be handled – BUT we don’t know the internal details of nFS or how they would want to export it – so the exact details on how to transport it have to be dealt with later. One alternative that I have not mentioned, is to copy all “non-superseded” info into the top level person, rather than using “superseded”.

N level 1

Will comment on this separately.

N level 2

EP5 is the same type of “EP” (Extraction Person) – as in 1 Level a – above. NB means “important”, "nota bene" in latin. NBs highlight what is special in the diagram.

1 level b

It describes how you would map current Gedcom into this model.

gthorud 2011-05-30T12:55:00-07:00

I have created two discussion topics here

http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39723332

http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39724034

The following is the remaining parts of my response to Tom’s comments.

Source and Source Lookup

Why 2 records? Because the Citation needs to reference the Lookup.

Why not point from Persona to Lookup – Most implementations are and will be using citations to point to source extracts, so it seems better to have it as it is. I don’t understand the “depends on/generated by” argument.

I can guess what eventas are, if my guess is correct, they are already in, if not – I don’t know. Undefined terms are not very helpful.

Citation Record

One reason for having Citation as a separate record would be due to merging of Personas in a multi level structure into a Conclusion Person (1 or 2 level implementation doing import) where a Persona level citation would have to be attached to only those events etc. it applies to. Then you are likely to have several pointers to the same citation. See discussion http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39723332

Source Free Text Extract Record.

See my drawing posted 30 May under the discussion of Rec Cats Data09. I see no mentioning of summaries, synopsis etc in the DeadEnds XML document, maybe I have missed something. Most implementations will not have an EP so can’t have it there. Can’t have it in the source since it must be referable.

Person Records

I have deliberately kept person-person relations out of this, they are not a primary concern of this doc.

Multi role events are already in the document. I don’t see any point in having more of the event info in Person Record – as you suggest for vital events.

Re. Climbing notes – see http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38364858 It is a way to concatenate the notes from lower levels into what would appear as eg. a sequence of paragraphs, in a user defined order - possibly excluding some of the notes from lower levels. It is not an ideal solution, but merging multiple levels into one will not work without it.

The citation record is not dependent on Personas or Conclusion persons, and can have argumentation. The details of citation is an area that need much more work, I have a 10 page doc in work and I am sure Gene has a lot to say about it. If they hold Conclusion justifications – well that might depend on what you want to conclude about, but if you want the argument to output in a citation – that’s what citations are for.

Event Common Data Record

You give no argument why you want the Role stuff in this record.

Subordinate Events is a new, and probably very complex concept – Why?

Further discussion should preferably be in new separate discussions.

ttwetmore 2011-05-30T14:17:42-07:00

The following is the remaining parts of my response to Tom’s comments.

T > I have added my answers to Gier's comments under the T > notation.
Source and Source Lookup

Why 2 records? Because the Citation needs to reference the Lookup.

T > I don't understand that reason. I guess I would need an example.

Why not point from Persona to Lookup – Most implementations are and will be using citations to point to source extracts, so it seems better to have it as it is. I don’t understand the “depends on/generated by” argument.

T > I don't understand that either. We have a vocabulary mismatch I think. First I have trouble because I don't know what a Lookup is. Because I think that a Source and a Source Lookup should be the Source record I am thinking of a Lookup as "just" a Source record. That might not be right.

T > Now I think we have something going in opposite directions in our minds. Let me say what I think should happen to the information in a Source. I think it should be converted into Persona and Event Records. This agrees with your model to a certain extent as you have Sources having codified Person and Event information either in them or referred to by them. Here is the "opposite" direction thing. I think you look at those Person Records as being "generated by" by the Source record and therefore you have the Source records point to the Person records. I think of the Person Records as "depending upon" or "coming from" the Source, so I have the Person Records pointed to the Sources. When a genealogis is working with his data, he will basically be starting with Person records, not Source records, so it is most natural for the researcher to need to go from Person Records to Source Records.

T > Please realize that in the above I have omitted mention of Citation records. That is because the DeadEnds notion of a Citation is extra fields added to the pointer from a Person record to a Source record. So let's say I add the Citation to the DeadEnds model. Then things would go like this:

T > Source Record -- Describes where the information comes from. It can contain within itself additional fields about the evidence it contains.

T > Person/a Record -- Record that codifies information about a person taken from evidence in a source. That record points to a Citation record (I'm accepting the Citation record by converting the DeadEnds SourceReference structure, found in DeadEnds Person records, into a new record type, the Citation record).

T > Citation Record -- A record that provides extra information needed to property cite or document information found in a Source. It points to the Source because it "depends" on the Source. Person Records point to these Citation records.

T > Thus the "DeadEnds" chain is Person -> Citation -> Source -> Repository for level-1 Persons, and Person -> Citation, where Citation is a different kind a beast for level-2+ Persons.

I can guess what eventas are, if my guess is correct, they are already in, if not – I don’t know. Undefined terms are not very helpful.

T > I think the are in. They are the codified Event Records created from evidence in Source Records. They are multi-level also.

Citation Record

One reason for having Citation as a separate record would be due to merging of Personas in a multi level structure into a Conclusion Person (1 or 2 level implementation doing import) where a Persona level citation would have to be attached to only those events etc. it applies to. Then you are likely to have several pointers to the same citation. See discussion http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39723332

T > Well, yes, that's fine, but the DeadEnds SourceReferece structure has exactly the same properties and abilities. Let's just take it as a given that I have accepted the practicality of the Citation record as you have proposed it.

Source Free Text Extract Record.

See my drawing posted 30 May under the discussion of Rec Cats Data09. I see no mentioning of summaries, synopsis etc in the DeadEnds XML document, maybe I have missed something. Most implementations will not have an EP so can’t have it there. Can’t have it in the source since it must be referable.

T > My underlying point is that you have, in my opinion, too many records in your "Source subsystem". I think all four of them (Source, Source Lookup, Source Free Extract and Citation) can all be handled by the single Source record with the SourceReference structure that points to Sources. My gut says that having four different records for "just" source information is too many.

T > The mention of synopses, etc, is in some of the earlier DeadEnds documents. I think the one that mentions them is on the DeadEnds site. The reason I didn't mention them in the document I wrote for Better GEDCOM, was that I was trying to boil things down to essentials and I thought that was a bit superfulous at the time.

Person Records

I have deliberately kept person-person relations out of this, they are not a primary concern of this doc.

T > That is fine. I don't see why you wouldn't want to include them, however, as they are a crictical part of what we need in genealogical databases. I guess you are saying that in your document you are stressing only the Source-based aspects of the model. Fine.

Multi role events are already in the document. I don’t see any point in having more of the event info in Person Record – as you suggest for vital events.

T > Here, then, I believe you are making a SERIOUS error of omission. Let's say your item of evidence is this single sentence, "John Jones was born on 14 March 1888 in Liverpool, England". What are you going to do with this? My answer is to create one Person record and keep the birth information as a vital event inside the person record. Are you saying you want to create a separte multi-role event record (with just one role) for this? As I said in my comments, sure you can do this, but doesn't it feel like forcing a square peg into a round hole.

T > Clearly I don't mind lots of records in my databases (eg., all those 100s and 100s of Persona records that some others object to, but I don't believe in adding truly superflous records (eg., Source Lookup, Source Text Extract, unnecessary Event records, ...).

Re. Climbing notes – see http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/38364858 It is a way to concatenate the notes from lower levels into what would appear as eg. a sequence of paragraphs, in a user defined order - possibly excluding some of the notes from lower levels. It is not an ideal solution, but merging multiple levels into one will not work without it.

T > Thanks. I'm more worried about what happens when 3 different level-1 Person records have different names, or when 2 different level-1 persons have conflicting birth date information. When these level-1 Persons are joined into level-2 Persons how are those things to be resolved? What should the the level-2 person created by binding those level-1 records "look like" when displayed on the screen or written up in reports. The easy answer is "just copy up the inforamtion you want displayed from chosen level-1 persons, but who in their right mind would ever have the patience to do that (well, maybe some). There has to be some generic rules for this with some overrides when the rules don't work. I thought this was somehow implied in the climbing notes issue; I see know that I was wrong, so now I wonder how you would answer the question that was truly behine my issue.

The citation record is not dependent on Personas or Conclusion persons, and can have argumentation. The details of citation is an area that need much more work, I have a 10 page doc in work and I am sure Gene has a lot to say about it. If they hold Conclusion justifications – well that might depend on what you want to conclude about, but if you want the argument to output in a citation – that’s what citations are for.

T > Thanks, I had come to that conclusion, yuck, yuck; thanks for confirming it.

Event Common Data Record

You give no argument why you want the Role stuff in this record.

T > I guess the best argument is that having the roles in the Event allows one to easily infer relationships. I hope you can see that. If only the role players point to the events, then there is no easy way to see that two of role players are in the same event and that their roles imply a relationship between them.

T > Example: you have this sentence: "John Smith was born to James Smith on March 22, 1889". I would create a 2-role Event from this, and I would create two level-1 Persons from this (and of course a Source record and 3 Citation records). In your method I think you would create the same 3 evidence records and you would have the 2 Person records point to the Event record with their roles (what you call EventParticpation structures). Fine, I do that too in DeadEnds. But, if that is all we do, how are you going to find out that John Smith and James Smith are related? You can definitiely go from either John or James Smith to the Event, and you know right away that they were either the father or the child in a birth event (that info is in Event Particiation or the RoleReference [what I call it] in the Person record). But given you know one of the participants AND you know the event, how are you going to find the OTHER participant and how are you going to infer the relaationship between them? If the Events also have RoleReferences (as they do in the DeadEnds model) then all works exactly as you would like.

T > Let's make this practical. Imagine you have the job of writing the software for finding fathers of Persons in the Better GEDCOM format. Grant me that there are two ways for this information to be found in Better GEDCOM Person records. First as person to person relationships (what you have said above you don't want to cover yet), or as roles in birth events. (There may other ways, eg, the controverial Family record that you have said nothing about yet, so say that's all for now). Now imagine writing that software in your head. The input to the algorithm is a Person record; the output from the algorithm is the set of Person records that meet the criteria of being the input Person's father. First imagine writing it when you don't have role pointers from Events to Persons; then imagine writing it when you do. BINGO, now you know why I want the Role stuff in the Events!!

T > Okay, now I'll be honest. If we were to "normalize" the Better GEDCOM model into a large number of large-sized relational tables, we would of course include a table that would relate all persons ids to all event ids that the person was a role player in. If this table exists that my argument above is superflous. We can infer fathers by an SQL Select statement that mentions a person table, this mapping table, and the event table. That's fine. I have this obstinate goal, I guess, of designing implementations of data models that can be efficiently dealt with when they are in a NETWORK database form rather than in a RELATIONAL form. Think of it this way. Imagine that a genealogical application decides to adhere to the Better GEDCOM "standard," and better yet, imagine that the developers of that application decide that since the Better GEDCOM gurus did such a fabulous job of desiging their model, that they (the app developers) want their database to be only and exactly made up of Better GEDCOM records. This is not a far-fetched idea when you think of how many genealogical applications today have internal databases based on GEDCOM.

T > So I have a PERSONAL goal for Better GEDCOM. That is, if the Better GEDCOM model were to be used as the basis of a genealogical app's database, that developers would be able to design and implement excellent genealogical applications with databases of Better GEDCOM records, and the applications would be as fast and performance savvy as any other application. That's a very long winded and personal answer as to why I want role references in Event records!!

T > It's more than personal actually. I am writing DeadEnds code now, and my model is very similar to the one you have proposed (given all my comments that is). And of course I am implementing the database as just DeadEnds records! The world "normalize" will never pass my lips!

Subordinate Events is a new, and probably very complex concept – Why?

T > For exactly the same reason we need multi-level Persons -- they are part of the evidence and conclusion chain -- we get evidence about the occurrences of events, and we eventually concluse that such an event actually occurred. I grant you that thinking about this leads to headaches.

ttwetmore 2011-05-31T03:15:22-07:00

Geir said "Multi role events are already in the document. I don’t see any point in having more of the event info in Person Record – as you suggest for vital events. "

I want to do one more example to show why I think the vital event is necessary. Say you have this evidence, "In 1838 John Smith, age 36, resided in Salem, Massachusetts."

Okay, let's say we create an event record (reside) and a persona record (John Smith) from this (using JSON-like notation):

event: {id: e1, type: reside, date: 1838, place: "Salem, Massachusetts", role: {type: primary, person: i1}} <<- here role is the same as Geir's Event Participation

person: {id: i1, role: {type: primary, event: e1, age: 38 years}} <<-- here Person also has a role attribute, not in Geir's model as discussed in my last response.

(Pretend I also create a source record and 2 citation records.)

Please tell me what important fact is missing? We haven't said anything about John's birth. We have some information here about when John Smith was born. We don't have any information about a real birth event, but we have good evidence about when approximately he was born. In fact, this fact might be the most important thing about this record. We must capture that information firmly and decisively.

What are you going to do with that information? Are you going to create a birth event? Do you think that this evidence from which you can infer some birth information justifies you in actually creating a birth event?

My answers are simple. Don't create a birth event as it's not justified from this evidence. But put a birth vital event in the person record like this:

person: {id: i1, role: {type: primary, event: e1, age: 38 years}, vital: {type: birth; date: about 1802, note: "date computed from age"}}

My rule of thumb is that evidence that directly mentions an actual event should be codified into an event records, but that evidence that simply allows you to infer some fact about an event should not be codified into event records. Can you see the distinction? I really don't think it's subtle. Event records are used for the first case, and vital event structures are used for the second.

How about a slightly more complex example, a marriage record: "John Doe, 28, of Hamilton, Massachusetts, married Jane Jones, 24, of Ipswich, Massachusetts, on June 26, 1876. Father of the groom is James Doe, shipbuilder, and father of the bride is Samuel Jones, shoemaker."

Please think about how you would codify this example into Personas and Events. There are obviously 4 personas and a marriage event. But there is so much more. But to capture all the information here all we need to do is add the person to person relationship (not yet in Geir's model, but hopefully coming), and the vital event (which Geir states we don't need). Just please try to work this up into useful records. I'll give my answer later.

Tom

gthorud 2011-05-31T13:46:22-07:00

Tom,

I have to take this in steps, so it may take some time before I have covered all your postings.

I have trouble accepting that there should be restrictions on the standard because it should fit a network database. As has been mentioned in the discussions on this page, it is assumed that programs will add two-way links internally as needed when they import the file. A sound principle in standardisation has as long as I can remember been that in principle the standard should be independent of the internal workings of programs - I think that principle should apply to BetterGedcom as well.

Also, given that most major implementations use a database internally, I don't think we should impose restrictions on BG with the intention that it should be used as an internal storage format.

Most databases today have no problem handling many record types, so if you have a problem with that I suggest you change your technology. There are several free databases out there.

You say you don't understand why I nhave a Lookup record. One reason is that a citation (or some reasoning) often has to refer to several sources. Since we (and Gedcom) have no concept that allows us to refer to internal structures in a record, and since I have not seen a need for it, you cannot simply refer to the Source record, you have to have an intermediate that among other things tell you "where in source". I have no need for a structure in the person record stating "where in source", and it would simply not work because a citation can refer to several sources (that is, lookups).

More to follow ...

ttwetmore 2011-05-31T18:48:26-07:00

Geir,

I have to take this in steps, so it may take some time before I have covered all your postings.

Thanks for your patience with my obstinacy.

I have trouble accepting that there should be restrictions on the standard because it should fit a network database.

I think it's a moot point. If the standard is based on an entity-relational data model, which I assume it will be, the network database is automatically its most natural implemenation.

As has been mentioned in the discussions on this page, it is assumed that programs will add two-way links internally as needed when they import the file.

I agree. But I also believe that there is a preferred direction when only one is realized.

A sound principle in standardisation has as long as I can remember been that in principle the standard should be independent of the internal workings of programs - I think that principle should apply to BetterGedcom as well.

I would say that a standard should not be based on the internal working of any given program, but I would say that a good standard will naturally suggest implementation ideas. Just think of this in MVC (model, view, controller) terms. BG provides the model. The view and the controller are highly constrained by the contents of the model. No matter what the BG model is, that model will automatically dictate aspects of any implementation that uses the model. I don't see any issue here.

Also, given that most major implementations use a database internally, I don't think we should impose restrictions on BG with the intention that it should be used as an internal storage format.

I think that's a moot point too. If the BG standard makes any sense at all it will be able to be used as an internal storage format. No need to impose any restrictions. One of the requirements on BG is that it be usable as an archival format. For me there is almost no practical difference between an internal storage format and an archival format.

Most databases today have no problem handling many record types, so if you have a problem with that I suggest you change your technology. There are several free databases out there.

I believe you misunderstood me. I was not saying that your model has too many record types because it will be hard to implmenet them in a network database (it would not be). I was saying that your model has too many record types because I believe it has too many record types! As I have said, I see no reason for the Source Lookup or the Source Text Extract records. I have accepted that it is okay for my idea of the SourceReference structure to be "grown" into a Citation record, or I would also be saying that the Citation record is not needed.

You say you don't understand why I nhave a Lookup record. One reason is that a citation (or some reasoning) often has to refer to several sources.

That is a real surprise to me. If you believe that a citation may need to refer to many sources then your concept of a citation is far different than mine. No wonder I don't understand the need for the Lookup record, which I still don't understand.

Since we (and Gedcom) have no concept that allows us to refer to internal structures in a record, and since I have not seen a need for it, you cannot simply refer to the Source record, you have to have an intermediate that among other things tell you "where in source".

Yes, you need to be able to say where in a source something comes from. That is what a Citation should be. I think you must clarify for us exactly what you think a Citation is. I cannot figure it out from your document.

I have no need for a structure in the person record stating "where in source", and it would simply not work because a citation can refer to several sources (that is, lookups).

I am still wholly confused by this. I think the source of my confusion is that you have, in my opinion, an unusual concept of Citation, and somehow, that unusual concept forces you to bring in the Source Lookup record. It would be very useful for me if you could clearly define what you mean by a Citatain, a Source and a Source Lookup. These terms do not meet my common sense criterion that all record types must represent obvious and important concepts in the real world. An example would be very useful.

Thanks, and i hope we will eventually understand what the other is trying to say!

Tom

gthorud 2011-06-01T16:48:37-07:00

Tom,

I look at the E&C multilevel model primarily as a way to be able to link and unlink information believed to relate to the same person. It describes and links information related to one person, but is not good at linking all possible information that may back up the reasoning in citations – which may be all sorts of info, not only that being output for that person. Letting the citations refer to arbitrary info is one of the things I am trying to do.

My model allows at least three ways to do things, a multilevel E&C only solution, a one or two level (E&)C solution referring to the source data via citations, or a combination of the two that allows you to cite all sorts of data and also to merge/unlink person data. And, important, it tries to do it in a way that would make different implementations to interwork – at least that is the intention.

There are in my view inherent problems with how the E&C model was originally proposed (see discussions on this page (and other pages over the months)), and we have been trying to come up with solutions to those problems – that is what the “superseded pointers” and the climbing note is trying to do, and we are still not finished with trying to work without imposing limitations on how reasoning is entered.

The Lookup record, as I currently see it – and this should not be difficult to see from the data it contains – is a way to bind together the various representations of a piece of source information. At the moment it also holds the “where in source”, “where in repository” and “who/when recorded it”, but that might change. As stated in the beginning of my document, the primary goal of my document is the overall design of the solution mentioned in the above paragraphs – and to create one document that captures a number of issues in one place, rather than being split into tens of discussion topics. What is important is the functionality, so I am open to discussions of the record structure as long as it will do what I want.

ttwetmore 2011-06-01T19:58:29-07:00

G > I look at the E&C multilevel model primarily as a way to be able to link and unlink information believed to relate to the same person. It describes and links information related to one person, but is not good at linking all possible information that may back up the reasoning in citations – which may be all sorts of info, not only that being output for that person. Letting the citations refer to arbitrary info is one of the things I am trying to do.

I agree. Each time you join a group at any level you must add a conclusion to hold your reasoning for making the joining, exactly analogous to citations at the bottom level. I don't know why you say it is not goot at linking all possible info that backs up your reasoning. It's perfect for that -- you just add that to the Conclusion record attached to each non tier-1 record. When you decide you were wrong, and unlink a level, then "poof" the Conclusion disappears.

G > My model allows at least three ways to do things, a multilevel E&C only solution, a one or two level (E&)C solution referring to the source data via citations, or a combination of the two that allows you to cite all sorts of data and also to merge/unlink person data. And, important, it tries to do it in a way that would make different implementations to interwork – at least that is the intention.

This has been the major goal of the DeadEnds model for over a decade! I am in complete agreement about this overall philosophy. I'd even say that "your" model is really "my" model as "I got there first!" Smiley, smiley.

G > There are in my view inherent problems with how the E&C model was originally proposed (see discussions on this page (and other pages over the months)), and we have been trying to come up with solutions to those problems – that is what the “superseded pointers” and the climbing note is trying to do, and we are still not finished with trying to work without imposing limitations on how reasoning is entered.

Maybe you could say what those problems are. I really think your solution is making things more complicated then they need to be.

G > The Lookup record, as I currently see it – and this should not be difficult to see from the data it contains – is a way to bind together the various representations of a piece of source information. At the moment it also holds the “where in source”, “where in repository” and “who/when recorded it”, but that might change. As stated in the beginning of my document, the primary goal of my document is the overall design of the solution mentioned in the above paragraphs – and to create one document that captures a number of issues in one place, rather than being split into tens of discussion topics. What is important is the functionality, so I am open to discussions of the record structure as long as it will do what I want.

It sounds to me your Lookup record is where you collect together any information you feel is important about a single item of evidence taken from a source. As I said before, that is the Citation record. I used to have this record in my earlier DeadEnds models. I called it the Evidence record. In that version of DeadEnds, there were Source records, then Evidence records extracted from the Source records (one Evidence record for every important item of information found in the Source), and then Person and Event records codified from the Evidence records. I didn't have any Citation records in that version either since all the info needed to generate citation strings in reports could be found in that structure of records. Currently I don't have the Evidence record in my model, as I think everything needed can be handled by the Source records and the codified Person and Event records generated from them. I leave open the option of reconsidering the Evidence record in the future

Note that your current Citation record is almost the Conclusion record I mentioned in my latest post on the other thread.

Tom

ttwetmore 2011-05-26T12:52:10-07:00

Storing Evidence in Genealogical Databases

I started a thread on soc.genealogy.computing called “How Should We Store Evidence in Genealogical Databases?” The intent was to find out what practicing genealogists think are the best ways to record their evidence. Since this might be an important issue for Better GEDCOM I have summarized what has been said on the thread. You can refer to the news group for details.

Paper – some genealogist work directly from paper records; they like to move paper around on their desks. They don’t add anything to their databases until they decide whom the evidence refers to. Some assign id numbers to each record and store them in filing systems (e.g., lever arch files). When they decide which person a record refers to they might add the data directly to the person’s record or they might add a source reference with the id number.

Mixed – some genealogists work from a variety of formats, including paper, text or image files on their computers, CD’s, and so forth. They somehow keep track of where everything is, and like the people who use paper, they only update their person records when they decide what evidence refers to what person.

askSam and similar programs – some genealogists use note-taking software like askSam or CircusPonies. They create entries for each item of evidence, and then use the software to quickly search the text of the entries. These genealogists still keep their evidence in its paper or computer file forms, but they use askSam as a way to quickly retrieve and organize information extracted from the evidence.

Spreadsheets – genealogists who have lots of records and have a complex task deciding who the evidence refers to, often summarize their evidence on spreadsheets. They may use any of the earlier listed methods to store their evidence, but then they depend on condifying some of the evidence in spreadsheet form to be able to use it effectively. Some genealogists use multiple spreadsheets, each geared to a different type of record. Some genealogists manipulate their spreadsheets by rearranging lines to group records that might refer to the same real person.

Person-based methods ¬– some genealogists don’t see the point of worrying about this at all. They find evidence wherever they can and in whatever form they can, and if it applies to someone they are interested in they add that information immediately to the person’s record. The evidence exists only in person records. They might keep paper copies of certificates, or image files, but that’s more incidental than fundamental.

Event-based Software – some genealogists see event-based programs as the answer; they want to convert each item of evidence into an event record, and then have their normal person records refer to those event records. If the evidence mentions someone not yet in the database, they add the person and have it reference the event.

Persona-based Software – some genealogists see persona-based programs as the answer. But since there aren’t any such programs around, some genealogists use conventional lineage-linked programs to simulate it. They add new person records for every person they find in evidence, so they may have many person records in their database that refer to the same real person. When they decide that two person records refer to the same person they merge them. These genealogists lament the problems inherent in unmerging persons when they find mistakes.

Double-database Approach – this is a twist on the persona-based approach. One genealogist suggests two databases (both from the same genealogical program). In the first all person records are persona records (person records holding only and exactly information extracted from a single item of evidence). In the second all person records are final, lineage-linked persons. When the genealogist decides a persona record in the first database should be added to a person record in the second database, the persona record is copied to the second database and then merged with the person record. Since the persona record never disappears from the first database, undoing merges, though still awkward, can be done completely and correctly.

Roll Your Own Persona-Based Approach – one genealogist mentioned he has his own program based on personas. There are actually at least two such genealogists since I am one too.

Adding the Evidence to Source Records – In my introduction to the thread, where I suggested a few possibilities, I mentioned the approach of adding the evidence to the source records, either in note form or a structured form. I mentioned this because it is a popular Better GEDCOM idea, tied to the idea of generating detailed citations. However, no one mentioned using the approach, or commented on it.

Multi-tiered Persona and Person Approach – This is my chosen approach, but other than some discussion of New Family Search, which is two-tiered, there was little discussion about this approach.

Tom

louiskessler 2011-05-26T21:14:05-07:00

Welcome back Tom!

Everyone knows how I feel, and over that last few months, I've been becoming more confident in my concepts as I have worked to solidify them.

Without going into too much detail now, (because I don't have time tonight) my very strong belief is strengthened by what Mills 1997 says on page 44:

"Evidence should be drawn from a variety of independent sources."

To me, if evidence comes from sources, then it should be stored with the sources.

This allows 2 things:

1. Repositories can do projects to make lists of the sources they hold (Source records), plus non-interpreted (just the facts, ma'am) lists of all the evidence contained by each source (Evidence records).

2. People can go to the repository or go online and search the evidence for the names, events, places or dates that might be relevant to them. When they find an item of evidence, they copy/download it along with the source and repository info, and can add the conclusion people, events, places, dates and other info, along with their conclusions and a link to the evidence record that becomes a part of their personal genealogy record.

I think this is so amazingly simple and emminently practical.

Louis

ttwetmore 2011-05-27T00:37:00-07:00

Louis,

Thanks.

You probably noted that "storing evidence with the their sources" was the only technique that no one at soc.genealogy.computing supported. I went to that news group to try to find a larger set of genealogists interested in the technical aspects of genealogy than there are at Better GEDCOM. I made "storing evidence with their sources" one of only three ways of supporting evidence that I mentioned, and I pointed out the argument that this could help generate citations. Nevertheless, no one there responded to the idea. This might just be a statistical anomaly, but I think Better GEDCOM should take seriously what a larger set of genealogists say about their needs and ideas for storing evidence. You will note that some of those ideas don't involve adding evidence to databases at all, so for some a simple tweak to GEDCOM would be sufficient. However, the responders included a number of genealogists far into the record-based realm of their research, and all of them indicated that storing evidence was important. Many of them, because of the inability of current software to store evidence, use spreadsheets and/or note-taking software to organize their evidence, and seem content with it. Those that want their genealogical apps to store their evidence generally prefer the persona approach, though some talk about "event-based" systems.

I don't see the "storing evidence with their sources" as practical in the record-based world. I believe that codifying evidence into persona and event records is the only way to handle record complexity. However I think it is a very simple extension to go from this:

"store the evidence with their sources"

to this:

"codify event and persona records from the evidence and have those records refer to their sources"

Tom

AdrianB38 2011-05-27T05:09:17-07:00

At the risk of ending up with 3 posters inclined to 3 different mechanisms....

Caveat - it does depend on what you call "evidence". According to Mark Tucker's definition on his Genealogical Research Process diagram, "Evidence = our interpretation of relevant information". Anyway, working from that....

While trying to mentally flesh out my research process description, I have become convinced that statements that don't come from "conventional" sources are just as important as those that do.

Consider, e.g., the statement that "All legally recognised marriages in England & Wales btw 1754 and 1837, had to take place in Church of England (or Wales) parish churches, specially designated chapels of those 2 churches, synagogues or Quaker Meeting Houses." (That may or may not be a full statement - I suspect that to use a synagogue or Meeting House, you had to be a member of those faiths, but let that pass).

This statement is as important to me as the contents of one particular marriage record since it offers a possible justification why the only marriage recorded btw people named X and Y should be that for "my" couple named X and Y.

At the moment, this statement just gets referred to in the notes for a source for a marriage record but arguably it should occupy a much more obvious place. I'd call this statement "evidence" (if not that - then what?) and clearly if it is to be formally stored, it should be stored just once. The question is - how to store that as evidence?

The most obvious thing is to store a source record for Hardwicke's Marriage Act (the relevant piece of legislation) but if we want the evidence to be separate to a degree from the source, then what? Creating an event for Hardwicke's Marriage Act to store the statement seems absurd. Apart from anything - how? I'm afraid creating a persona to store the statement seems as bad.

The only sensible way (I think) is to record the statement as a free-format statement, linked to the source-record somehow. Either as a separate, top-level entity or (in GEDCOM terms) as a level 1 thing under the level 0 for the source record for Hardwicke's Marriage Act.

I wonder how many people would consider they ought to have source records for legislation like that? A few, I suspect. I wonder how many will tell us "That's not what a source is about..." A few, I suspect.

Anyway - in trying to decide which way to go from the above, we might consider the infernal topic of negative evidence. Leaving aside the nagging view that "negative evidence" is an oxymoron (i.e. an inherent self-contradiction), consider the statement "There is no record of a baptism for Theophilus P Wildebeest in Bristol between 1800 and 1850". I would consider this to be a piece of negative evidence that also needs to be stored. Again, currently I just enter it as a note somewhere, but again arguably it should occupy a much more obvious place.

As above, it would seem the only viable representation is a free-format statement. Unlike the above, there isn't the option to store it as a level 1 thing under the level 0 for a source record since, by definition, there isn't one.

Thus, a separate, top-level entity to contain the evidence statement seems the only viable way forward. It would not be impossible to concoct imaginary sources to contain these statements but it seems an illogical thing to do if one could avoid it.

The idea of free-format evidence statements as top-level entities has one issue for me _personally_ though - it sounds awfully like GenTech assertions...

gthorud 2011-05-27T06:35:26-07:00

Tom,

Not much new from usenet. Seems like there are a lot of people doing all sorts of thing, some of them waiting for a better solution.

While you have been on usenet we have tried to take the models a step further, into a solution that could handle 1, 2 and multilevel implementations, and at the same time store the evidence with the sources. (Evidence will represent the content in the source. It will have a codified form when used in multilevel solution, and may be stored in other forms.) The solution will allow a one level implementation to reference the evidence stored with the source, and a multilevel implementation to have its personas. A requirement is to do it in a way that can let programs supporting 1, 2 or multi levels can interwork.

Unless someone discovers flaws in this design, I think it can be finished within reasonable time. It will then be up to implementers to choose what solution they want, and the discussion on whether or not to implement a multi-level solution should preferably take place elsewhere. When the design is finished, we should move on to other areas that require work.

If someone discovers major flaws in the design of the handling of more than one level, I will personally prefer that someone table a more or less complete alternative solution, satisfying the interworking requirements stated above. If necessary, we should discuss that at some stage after the other areas of work have had some of the attention they deserve.

If there is no interest in doing any practical design, I will just conclude that there is really no interest in developing solutions for the requirements described in the postings above. I will then finish the sketch, fixing the loose ends, and then move on to other work, incl. refinement of storage of evidence with sources.

I see that Adrian’s ideas might end up with some refinements of citation structures, I don’t see them changing the overall structure, but will considder them when I see some concrete proposals in the research process area.

ttwetmore 2011-05-27T09:05:44-07:00

Geir,

While you have been on usenet we have tried to take the models a step further, into a solution that could handle 1, 2 and multilevel implementations, and at the same time store the evidence with the sources. (Evidence will represent the content in the source. It will have a codified form when used in multilevel solution, and may be stored in other forms.) The solution will allow a one level implementation to reference the evidence stored with the source, and a multilevel implementation to have its personas. A requirement is to do it in a way that can let programs supporting 1, 2 or multi levels can interwork.

Sounds like a complete solution that should satisfy everybody. My only question has to do with exactly how will evidence be formatted when it is kept in the source records. I can imagine it as references to files or URLs, to be free format text that summarizes or synopsizes the information, or to be in some kind of structured format that systematically codifies the information into structured fields in the source records. Has their been any thought about that?

Tom

ttwetmore 2011-05-27T09:07:53-07:00

Adrian,

At the risk of ending up with 3 posters inclined to 3 different mechanisms....

The more opinions the better!

Caveat - it does depend on what you call "evidence".

For me evidence is any information I find in any source that provides data about persons I am or might be interested in. This may be too narrow a definition.

While trying to mentally flesh out my research process description, I have become convinced that statements that don't come from "conventional" sources are just as important as those that do.

I certainly agree with you, but I don't think of those statements in the same way I think about evidence. My guideline would be that if you can take a statement and codify information in it into persona or event records, it's genealogical evidence. Other statements might help you interpret facts about perople or events.

The only sensible way (I think) is to record the statement as a free-format statement, linked to the source-record somehow. Either as a separate, top-level entity or (in GEDCOM terms) as a level 1 thing under the level 0 for the source record for Hardwicke's Marriage Act.

Sounds like a good common sense approach to me.

Anyway - in trying to decide which way to go from the above, we might consider the infernal topic of negative evidence. Leaving aside the nagging view that "negative evidence" is an oxymoron (i.e. an inherent self-contradiction), consider the statement "There is no record of a baptism for Theophilus P Wildebeest in Bristol between 1800 and 1850". I would consider this to be a piece of negative evidence that also needs to be stored. Again, currently I just enter it as a note somewhere, but again arguably it should occupy a much more obvious place.

I still have nothing (intelligent or otherwise) to say about negative evidence. For me it's never been a big deal so I haven't worried about incorporating it into my models. My common sense says to create a persona with the person of interest's name and add a single note with the negative evidence to that persona.

Tom

gthorud 2011-05-27T09:51:26-07:00

Tom,

Go read the propsal, then we can discuss.
URLs is a tiny detail in the overall structure, we will come to that.

ttwetmore 2011-05-27T11:48:51-07:00

Geir,

Please provide a link to the proposal.

ttwetmore 2011-05-27T12:01:55-07:00

This is a long post. I extracted it from the soc.genealogy.computing thread "How Should We Store Evidence in Genealogical Databases?". This post is by Richard Smith. It makes for very interesting reading and it has a strong bearing on this topic.

Tom

Richard Smith writes:

I regard genealogical research as a seven stage process, and I tend to
handle the data generated at each stage in different ways.

1) Planning

Sometimes I've got a specific objective in mind -- something like
"find out who Thomas Smith's parents are". For each of these
objectives, I create a text file with a few notes about where might be
a good place to search for evidence, where I've already looked, and a
mixture of speculation and notes to myself. I name the file by
surname, name and some additional suffix (say "the boot-maker") to
make the person unique; if there's more than one plan per person
(there rarely is), I'll disambiguate it in some further way. I also
use symlinks (a bit like Window's shortcuts) to maintain an index of
such plans by ancestor number in a separate directory.

As I've got further back, I've found more and more frequently I don't
have such as specific objective. The ultimate objective is usually to
push back one or more generation, but I'm no longer specifically
targeting records with that individual in mind; instead, I'm gathering
as much information as I can do on the surname in the area. I have a
directory with a more general set of plan files with just a surnames
and area (typically a parish name somewhere near the centre of the
area of interest).

I use a revision control system (currently CVS) to keep track of
changes to these plan files, and also to assist in backing them up.

2) Searching

Whenever I search for something, I try to note the fact that I've done
it in one of the plan files. This is particularly important if the
search fails to find anything. If I'm in a records office, I tend to
have a printout of the plan file and scribble on it, typing the notes
up later. Sometimes I do the same for on-line research.

I find on-line sites such as ancestry.com and familysearch.org
particularly troublesome in this regard -- it's far too easy to spend
an hour or two searching for things and forgetting to note anything
down. Neither site keeps a log (at least, not that's available to the
user) of what you've searched for, so you can't go back and write it
up later.

For this reason I no longer use familysearch.org directly. The only
time I ever used it was to look up things on the IGI, so I wrote a
perl script to drive the (old) site, do searches for me, download the
full data set as GEDCOM and log each search I do to the appropriate
plan file. The program requires me to associate the search with a
specific plan, so I can't avoid recording the fact I've done a search.

Putting these search logs into a database, and associating them with a
source and/or repository, would be an obvious improvement. I did
briefly experiment with gnote and mediawiki for the plan files but
gave up -- I found them both overkill for what I wanted.

The result of the search will vary. It might be a piece of GEDCOM (as
per the example above), or an image (e.g. a census image on
ancestry.com), or a entry in book (in which case I may or may not have
been able to make a copy of it). Any paper copies I do end up with
get scanned, and everything gets stored in directories, classified by
type of record and surname. I'm not a big fan of putting things like
images in a database, though indexing them in a database would be
useful. At the moment, the only index I have is the directory
listing. (As with plan files, I sometimes use symlinks if one
document should be filed in multiple places.)

3) Transcribing

Having found a document, the next job is to transcribe it. Often the
result is a flat text file, again one file per source. I try to
transcribe the document as accurately as plain text will allow, and
there's the odd bit of ad hoc mark-up in it to document important bits
of formatting: e.g. [struck-through: my daughter Isabella] or
[inserted: Hampshire]. I very much like the idea that Nick Matthews
suggested elsewhere of using XML for this, and may well start doing
so.

In longer documents, such as wills, I tend to put asterisks around
peoples names to assist in searching; similarly, I often add ISO-style
dates in parentheses [2011-05-24]. I don't do similar tagging for
place names, though if I move to a light-weight XML format, I probably
will do.

In other cases, the source is essentially a long table. Baptism
registers or census forms are a good example of this. In these cases,
I use a tab-separated text file to record each field. That makes it
easy to import into a spreadsheet or database, but at present the
primary version is simply in the text files. Sometimes I'll use a
spreadsheet to create them too, especially if I'm entering a large
number by hand. If I need to add extra notes, they end up in the
rightmost column.

Tabular data of this sort is, again, an example of something that
could usefully go into a database.

At the moment, the text files get stored in CVS to retain a version
history and to back them up.

4) Translating

This stage is often irrelevant as the source is often in English (the
only language I speak fluently). When it is necessary, I put the
translation below the original transcript, in the same file as it.
Even in English documents, there's sometimes an element of
translation: for example, I'll add a note to remind myself what I
think some obscure word or abbreviation means.

5) Extracting

This is the stage that seems to be causing all the excitement here.
It is when I extract the genealogical content from the source and put
it into some computer-readable form. Typically I use GEDCOM as the
destination format, simply because of its ubiquity. Sometimes I find
GEDCOM inadequate for the purpose. For example, if a will mentions
two grandchildren but gives no indication of whether the grandchildren
are siblings, there's no way of expressing this in GEDCOM. In such a
case, I'll either misuse GEDCOM to express what I need as best I can,
or simply not bother extracting that bit of information (perhaps
instead putting into a text note).

For things like censuses, baptisms and so on, because the result of
the transcription is already in a nice easy-to-parse tabular form, I
have scripts that automatically create GEDCOM from the tables.
Sometimes it needs hand editing afterwards to add some extra
information that was in the source, but outside of the expected data
-- for example, I once found a census on which two children had been
grouped together with a big "}" and "twins" written next to them. In
earlier baptism registers, the data is often more or less tabular, but
with implicit fields recording whatever the priest felt was necessary;
and occasionally an entry will have extra information included. Such
cases need manual handling.

I've also got a number of scripts that create blank bits of GEDCOM --
templates, if you like -- that I can then fill in. That fills in
suitable source information.

The result is hundreds of small GEDCOM files, one per source. Some
(e.g. from a gravestone) just contain a single individual and little
else; others (e.g. from a parish register or from an IGI search) may
contain hundreds of individuals, some of whom may be duplicates (for
example, if a couple have three children baptised, then the parents
will appear three times).

These GEDCOM files then get stored in CVS -- even the automatically
generated ones. I will sometimes upload them into a genealogy
program, but as I've not really settled on one that I like, I regard
the GEDCOM as the primary version and never (well, rarely, anyway) use
the program to make changes. It's just a tool to help me process or
visualise the information. I've also experimented converting the
GEDCOM to RDF and importing into an RDF processor (typically the
Redland one) so that I can run SPARQL queries against it. This is
really powerful, but also painfully tedious to use. I do see a future
for something like this, though.

I've also got a script that can search a directory tree of GEDCOM
files looking for people that match specific criteria -- at the
moment, it's pretty primitive, basically just doing name, date at a
particular event, role in the event. It was originally designed for
looking for baptisms, but has expanded a bit.

6) Reasoning

This is the stage that most people think of as genealogy. It's where
I try to work out how I need to combine the persona-level data
extracted from the sources into real people. Was the John Smith in
the 1851 census the one baptised in North Dunny or South Dunny, or
maybe neither? This typically involves looking through all of the
extracted persona-level data for people with the same (or a similar)
surname in the locality over quite a long period. I tend to the view
that unless I can understand every instance of surname in the source
record, I cannot be confident that I've pieced it all together
correctly. (And sometimes even then I can't be confident of it.) An
unexplained burial could be
evidence that what I had considered to be one family was in fact two,
for example, and that might have knock-on-effects elsewhere.

How I work at this stage depends on how many people I have. Sometimes
there are few enough personae that I can keep everything in my head.
For larger groups, I tend to print things out and spread everything
out of my dining room table. In the very largest cases that's
infeasible. For example, I once had an ancestor called John Smith and
all I knew was that he was a cobbler, from Southampton, and an
approximate date of birth from the 1841 census. Trying to sort out
all of the Smiths in a big town was a complex task. (In the end I
discovered evidence that he wasn't actually from Southampton after all
-- he'd just lived there for a while before his marriage.) In that
case, I created spreadsheet with everyone in. (And I still use an
extended version of that spreadsheet as an index to the other
records.)

Once I've sorted things out into groups, then I enter them into Gramps
(my current preferred program). I'll import bits of the persona-level
GEDCOM because that's a convenient way of keeping source information
with it. (Irritatingly I have to strip the repository from the GEDCOM
and manually reassign it because Gramps can't, so far as I know, merge
repositories as it can with other things, but that's a minor
difficulty.)

But what this doesn't do is give me any way of of documenting why I've
merged the personae as I have. Sometimes this will be immediately
obvious from the sources; but other times it won't. But at times, the
reasoning process is more sophisticated. I often start with a large
number of possibilities, consider each one and gradually discount
possibilities as being too improbable until only one remains which for
the time being I regard as probably correct. Documenting such things
is tricky, but I really do care about documenting things: not
primarily to justify my conclusions to others (though that is useful),
but so that I can easily revisit them as further evidence comes to
light, or as I correct any mistakes.

At present, I use the plan files that I create right at the start of
the whole research process to add notes on why I came to the
conclusions I did. But this means the documentation behind the
merging process is not kept with the merged individuals; nor is there
a computer-readable link from the source to the documentation. I
really want there to be so that if I have to correct a mistake in my
transcription / translation / interpretation of the source, I can
readily see what knock-on effects it might have.

7) Presentation

The final step is presenting the data in a good way. That might means
drawing trees (which many programs seem quite poor at), drawing
ancestor tables (which they're much better at), or maybe just
producing indexes of people. But this step is really beyond the theme
of this discussion.

Like most people, I expect, in practice, these seven steps often get
blurred together, or some of them are not relevant. But whenever I
find myself thinking about how to store some new sort of data, or how
to rearrange the way I file things, I do find it very useful to think
in terms of these seven steps.

Richard

gthorud 2011-05-27T12:25:18-07:00

Tom,
http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416

gthorud 2011-05-27T13:45:39-07:00

I suggest the discussion of various ways to record data from sources should take place on the Rec cat Data09, except the codified data related to the E&C model.

gthorud 2011-05-30T12:20:00-07:00

Citations in a multilevel model

The following is based on the document presented here http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416

There are two issues:

Issue 1: Tom, in his comments on the document, seems to think that Conclusion persons should not refer to citations. If I have understod this correctly, the question is why?

Issue 2: Assuming that citations can be refered to by a CP, how should person level citations be handled when a multilevel hierarchy is collapsed into one level on import? I see no other solution than to remove the person level reference, and attach it to each event, gender, note and name in the lower level person record, before merging the record into the next higher level. It is the only way to know what the citation applies to.

GeneJ 2011-06-03T19:14:53-07:00

Hi Tom,

I understood "string" to be in that meaning.

I intended to communicate that a citation in the real world is more than a description of "where in a source an item of evidence comes from."

ttwetmore 2011-06-03T20:05:14-07:00

And I continue to believe that a citation is a description of "where in a source an item of evidence comes from."

ttwetmore 2011-06-04T05:16:21-07:00

I described the three worlds of Evidence, Conclusion, and Biography, for the purpose of demonstrating three types of information we might want in our genealogical databases.

The main reason for laying out these three worlds was to try to demonstrate that the citation-like/note-like/footnote-like things are different for the three worlds.

For the Evidence world we should record where the evidence comes from. This involves specifying the Source the Evidence comes from, and whatever other information we wish to specify to describe the Evidence. One of these extra things is where in the Source that particular item of Evidence comes from. This concept has been called a CitatIon for a long time, like centuries, certainly since way before ESM was born. Information in Citations are often used, as we know, to generate Footnotes, Bibliographic Entries, etc.

For the Conclusion world, a higher layer in what we now seem to be calling the multi-level world, when we decide that different items of Evidence refer to the same Person we need a way to somehow "bind" those items of Evidence together. The Person/Persona ideas we've been discussing provide that binding mechanism, but we need to document/justify why we perform each such binding. If we do this correctly we must add information to our database to justify the bindings. I've been calling this information a Conclusion, and have stated before my opinion that a Conclusion is to a conclusion-level Person as a Citation is to an evidence-level Persona. The purpose of a Conclusion record is identical to that of a Citation record -- to document where the conclusion came from, that is, its justification. Obviously Conclusion records can also be used to generate Footnote and Bibliographic entries and so forth.

For the Biography world, where we are preparing to generate a report that fully describes our research and our conclusions, we need to generate a well-formatted document. All current genealogical programs claim that they provide report generating features. But as we all know these reports are a few standard charts, and a few very unsatisfying reports that generate totally stilted English from certain items of information (e.g., birth and death events). If we really want to crack the nut of high quality document generation, we have to address the issue of getting the text we want to appear about different persons, and the footnotes or other justifications we want to appear along with that text generated. Obviously that text has to get into our database somehow.

I believe we are all arguing about and defining and redefining the word Citation, simply because we are mixing together three very important concepts -- 1) the Citations that describe where Evidence comes from; 2) the Conclusions we make when we decide what items of Evidence refer to the same Person; and 3) the Footnotes and Bibliographic Entries that we want to see in our research reports, entries that might be research notes, detailed descriptions of the implications inherent in one item of evidence, an elaboration of the reasons a particular conclusion was made, a reference to an important historical event that affected persons in the report, and so on, without any limit. The Bibliographic entries in a report would include all three of these things: the Evidence Citations, the Conclusions, and all the other Footnote-like things added.

I believe the biographic text and the footnote-like entries that are added in the Biography world should be additional data-model-defined attributes we add to Conclusion Person and/or a new record or two.

Anyhow, I think all the arguing and disagreement about the term Citation comes from different opinions about what to call information that can show up in Footnotes. Clearly for me, the word Citation has its classic definition, but to others its meaning has been broadened to include Conclusions and any kind of reference note or other kind of information at all. All three of these types of things are needed for Footnotes and for Bibliographic Entries. If we could simply agree about this three-level view of where Bibliographic Entries come from we would be much better off. If we were to agree that all three of these types of things are going to be what we call Citations, though I would disagree with it, I would go along with it -- but only if we actually thought about what we were doing, and how we are mixing together three different concepts from the "three world" data-model of Evidence, Conclusion and Biography.

I tried valiantly to point this out all this in my "Footnotes Are Not Citations" thread, but maybe I was being too subtle. Some of us hold to the view that anything that might appear in a Footnote in a report should be called a Citation. Some of us hold that a Citation is something that describes where Evidence comes from and can appear as one type of Footnote.

gthorud 2011-06-04T07:36:16-07:00

There is also something called the real world, and in it is something called summer. I plan to enjoy our short summer so I will not have as much time to discuss. Also with heat, my shoulder has become worse, so I can’t type much, and it may take a while before I can contribute much - we will see.

GeneJ 2011-06-04T10:02:16-07:00

We will miss you, Geir, and hope as much for your speedy recovery as we do your summer enjoyment. --GJ

gthorud 2011-06-08T07:27:33-07:00

I have read through several recent posts and want to sum up my views. I am sorry that I am able to post as frequent this time of the year.

Although my Lookup record contains some of the info as in a citation, it is not simply referring to a source, where in it, and a repository. Well, it contains that info, and more, as can be seen in my document. It links various representations of the relevant information found in that place in the source. So the Lookup is not simply a citation. Linking the various representations of data derived from somewhere in a source is useful because it allows you to compare the various interpretations, it allows you to see the data as structured in the source, it allows you to see only the codified data derived from for example the content of a record/paragraph/household/etc. in a source (not everything on a page), and it allows this with or without multi-level persons. (An additional benefit could be the ability to order what could be called “primary” codified data (the census event from a census record) before “secondary” events (e.g. the birth event constructed from an age in the census)). I will change the Lookup record so it can refer to events and persons.

By linking transcriptions (in various representations) or images of sources to source records, you can go through that transcription and the Lookup will tell you what info in a source has been utilized, and for what purpose – e.g. which events and persons are based on a particular record. You will also be able to see which data have not been utilized, and e.g. avoid using the same birth record for two different conclusion persons. See also Req Cat Data09. You can also see if the source info has been utilized in a citation, even if the info is not contained in a Persona of the Conclusion person citing the source. This way of working will become more and more important as the amount of digitized sources increases.
If a program does not implement Personas, it should be a simple task to merge codified source info from such transcriptions into a 1-level conclusion person, but my model also allow you to include the info as personas subordinate to conclusion persons. It should then be up to implementers to choose what to implement.

The citation record is the key record that would trigger the output of a footnote, endnote or inline note. Although I do not think our structure should be restricted by definitions, rather than using the term “citation” it seems like the term “Reference note” used by EE could be a better term. It can contain a citation or a comment (without any source reference), but its EE-definition will have to be extended with inline citation which is supported by programs and style guides. Also, the definition in EE states that “the reference note’s purpose is to identify and/or discuss the [content in] the source” so its function is not limited to identify a location in a source.

The citation record, hereby re-baptized “Reference note”, does not contain all the info output in the reference note, it also refers to the Lookup record for additional data. I realize that allowing a citation record to be referred to by several events, persons, etc. is not the most important issue, although at would have saved some paper and ibids, so I will drop that capability. (Some programs allow this feature by finding identical content of reference notes.) This change MAY remove the need for having a Reference note as a RECORD, rather it may become a common structure used in various places. There is a sound basis for letting a reference note refer to several sources in style guides (many programs can actually generate multi-source reference notes), and it seems reasonable to be able to let some argument refer to several sources, so I will keep that capability.

One important feature of the Reference note and Lookup (combined) is to be able to refer to codified Persona records (or other representations of the source content, e.g. events), so you can refer to codified data from any record, and you can refer to codified data being “part of” any person – not only the Personas for the Conclusion person that refers to the Reference Note (as is the case when you link Conclusion persons to subordinate Personas). (I would not be surprised if the Research process data structures would also want to reference a Lookup, and maybe even a Reference note record, showing the data in the puzzle on your desktop during the research process.)

Tom wants ONE place to record why two persons are joined, but does not state what the benefits of this are. I don’t see a need to restrict that statement about “why” to appear in a particular place, it can go many places – in a reference note, in an ordinary note at the person level, in a research note at the same level (which does not output) and/or in some of the structures that may result from Adrian’s Research process work. And I don’t see any reason to restrict “the statement about why these persons are the same” to not refer to one or more sources or restrict it to always refer to a “MY BRAIN” source – it may not need a source at all.

And, there are other things to make conclusions about, not only why you believe they are the same person – conclusions can be about anything.

It seems Tom is accepting that a Conclusion person, having no subordinate persona, can have facts referring to sources via a Reference note, so we are now able to import Conclusion only persons from 1-level programs. But, I am not sure if he is now accepting that facts in a Conclusion person can refer to sources via a Reference note, even if it has subordinate Personas. If you prevent a Conclusion Person record from referring to any source, it will, as I have stated above, limit how the researcher can document things and it will also prevent e.g. a merged event in a conclusion person from referring to sources (a merged event resulting from e.g. two superseded Persona events) – I do not understand how Conclusion persons with subordinate Personas can work without being able to reference sources.

About the Biography world. I am not sure I understand the benefits of this separation into worlds, it seems like the model is more important than practical considerations. I am not sure I understand what the Biography world is. Since my main purpose of entering data in a genealogy program is to be able to produce reports and charts, I have had focus on that need all the time. Is the suggestion that we have to record the information more than once? We may also end up with a “Research process world” where at least the reasoning could be recorded, and the reasoning could end up in the Conclusion world, and then we have to record it a third time in the Biography world. That is a lot of recording.

Although I have no requirement for a multilevel E&C-model, I have been working on a design that takes into consideration the arguments against the original design, and at the same time allow referencing evidence data from sources without implementing multi-level persons (which has been requested by other participants), and also allow data transfer between programs implementing different options. I don’t think constrains imposed by the various “worlds”, that I don’t see any reason for, should prevent such a solution. If it turns out that it is not possible to get agreement on a solution with multi-level E&C in the near future, I will just document how I see it could be done, and move on to other areas that have been waiting for a long time.

AdrianB38 2011-06-08T14:02:56-07:00

"the definition in EE states that 'the reference note’s purpose is to identify and/or discuss the [content in] the source' so its function is not limited to identify a location in a source."
Indeed - though in that case I would prefer that the different aspects were separated in the data model and in the consequent database / BG structure. That way (a) it's clearer what bit is what and (b) software can operate on the right part. (NB - you may have separated them, Geir, I've not gone deeply enough into your work to know one way or the other - I just felt I ought to make the comment that the stuff should be analysed and split.)

"Tom wants ONE place to record why two persons are joined, but does not state what the benefits of this are". I think we ought to have an agreed spot where the logic is stored, in the same way that we have an agreed spot where "citations" are held. Otherwise - where will the app look? Now, having said that, if someone ignores our nice new structure and just writes everything in a note, then all bets are off, of course.

"I don’t see any reason to restrict 'the statement about why these persons are the same' to not refer to one or more sources or restrict it to always refer to a 'MY BRAIN' source – it may not need a source at all."
I agree - I think. What it _should_ refer to is actually what _any_ resulting value should refer to - namely, it should refer to a conclusion statement, which should refer to a "proof", which should refer to the evidence used, which (only then), should refer to the source that the evidence came from. And this applies to why 2 evidence-persons are the same conclusion person as much as anything. And it may be there are several sources and it may be that a piece of evidence has no source - ESM says we do not need to source the dates of the American Civil War, e.g., which could be used as a piece of evidence. (OK - you might not - I would, from my side of the Atlantic!)

"facts in a Conclusion person can refer to sources" Normally, the facts would come from the lower level evidence persons (if you have them). The only way I see facts appearing for the first time at a higher level is if you don't follow the persona route but record your evidence otherwise; or if the fact value that you settle on is different from the evidence - e.g. date and month from one source but year from another. In that last case, the fact values appears for the first time at a higher level and it should be pointed to a justification - either a conclusion statement (in my Research Model) or a "My Brain" source if not. Which I think you refer to, Geir, with "a merged event in a conclusion person".

"About the Biography world. I am not sure I understand the benefits of this separation into worlds" I rather like it - being of an analytical mind, I dislike these great long essays in footnotes because anything could get into them. (I suspect the actual situation is not quite as bad as I fear). But if we split the bits for each world, then they could be concatenated for the final print. And I don't imagine we would ever enter anything twice. Just cross refer.

gthorud 2011-06-08T14:34:22-07:00

I'll comment on mainly the last paragraph for the moment.

I think we should try to advance your work on the research process to identify the various places where "conclusion arguments" could be recorded.

As you may have noted in the last part of the long document with comments I wrote on the research process page, I was concerned about having to record things twice and at least tried to ask the question if it was possible to map (cross reference) the research process data into fields that are used to generate narrative output. One challenge is that there are different preferences wrt to where the info will end up in reports. And it may be difficult to produce a natural language when you for example concatenate sentences written in different contexts and at different times.

ttwetmore 2011-06-08T15:24:52-07:00

Geir,

Thanks for your long post. I'm not going to write a long reply. I've had my say more than enough times.

My concern is that Better GEDCOM is on a tangent trying to deal with a complex taxonomy of citations, references notes, conclusions, sources, source lookups, source extraction records, evidence formats, etc, and not dealing with the core entities of the genealogical data model. It looks to me like Better GEDCOM has lost its way. I don't know if this is a pendulum that will eventually swing back to persons and events. I fear we are stuck in an area of esoterica that almost no one will understand or follow. I will no longer try to unravel all the complexities that those who think the key to Better GEDCOM is to refocus around the concepts of generalized citations and research notes. I think it is very easy to add those things and we've had the ability forever via notes.

We have taken a relatively simply job of trying to understand how to record the info needed for ESM citations, which I think is very easy to do, and turned it into a complex task.

Better GEDCOM should deal primarily with entities for sources, persons, events (and places). Your source lookup record can be accommodated as a general purpose evidence record that can be added to stand between codified records and source records, holding citation fields and giving alternate forms of the evidence. I've gone back and forth on the need for that record for 15 years, and I think it can be put to good use for genealogists truly patient or anal enough to want to create them. Better Gedcom should have records for recording decisions and conclusions. Beyond that, if you want to record complex footnotes, or complex, multi-sentence descriptions, ANYTHING that you would like to be added to a report, then Better GEDCOM should have structures defined within person records for those statements. Report generating features can then pick up those statements and those footnotes and add them into the rest of the automatically generated parts of report outputs. If you want to record your research notes in a log, fine, have an entity for a log and and entity for a log entry, and let the user put anything they like in the entries.

I will try to desist on commenting further.

Tom

ttwetmore 2011-06-08T16:17:05-07:00

"Tom wants ONE place to record why two persons are joined, but does not state what the benefits of this are. I don’t see a need to restrict that statement about “why” to appear in a particular place, it can go many places – in a reference note, in an ordinary note at the person level, in a research note at the same level (which does not output) and/or in some of the structures that may result from Adrian’s Research process work. And I don’t see any reason to restrict “the statement about why these persons are the same” to not refer to one or more sources or restrict it to always refer to a “MY BRAIN” source – it may not need a source at all."

In a multi-tier system there is one and only one place in the tree where the two persons are joined. That point is the point where the researcher has decided that the two persons are the same. I'll repeat with slightly different phrasing. That is the PRECISE POINT where the researcher has reached the conclusion that the two persons are the same. This is therefore the point at which the researcher should justify making that decision/conclusion. This is nothing more than the scientific method, where each step in a proof, of which this is a very important step in a genealogical proof, requires a formal justification. Think Euclid for genealogy. Aren't the benefits obvious? They are to provide convincing arguments why the conclusion (the joining) was made. Tom Jones would certainly see its benefits!

One must view the multi-tier tree as a static map that represents all the research that has been done. Each persona comes from one of the records the researcher has found. Each person in a higher tier represents an inference (see Tom Jones) or a conclusion or a decision that the researcher has made during his/her research. The map has collapsed out the time dimension (you can't tell when the each record was added relative to each other, or when the decisions were made) but the tree still represents the entirety of the E&C process that the researcher has gone through up to the current point in time.

Certainly the researcher can refer to this decision anywhere in later footnotes or written text; there should be no restrictions on free notes. The point is that this is the exact point where the decision was made so it should be justified at that point.

Isn't this obvious? It isn't a matter of "Tom wants..."; it's a matter of "It's the scientifically correct thing to do."

ttwetmore 2011-06-08T23:47:43-07:00

"It seems Tom is accepting that a Conclusion person, having no subordinate persona, can have facts referring to sources via a Reference note, so we are now able to import Conclusion only persons from 1-level programs. But, I am not sure if he is now accepting that facts in a Conclusion person can refer to sources via a Reference note, even if it has subordinate Personas. If you prevent a Conclusion Person record from referring to any source, it will, as I have stated above, limit how the researcher can document things and it will also prevent e.g. a merged event in a conclusion person from referring to sources (a merged event resulting from e.g. two superseded Persona events) – I do not understand how Conclusion persons with subordinate Personas can work without being able to reference sources."

A conclusion person with no subordinates personas, if it is a good genealogical citizen, will have its individual facts refer to separate sources. This is clear and is what programs of today allow.

A conclusion person with subordinate personas does not need sources for the information coming from the personas, but does need a "source" for the overall conclusion that conclusion person represents. Since such a person record is a conclusion its only "source" is the reason the conclusion was made. I call this "source" a source in the DeadEnds model, by the way, as for me a source is whatever information is needed to justify the existence of the item the source refers to.

Nevertheless, DeadEnds puts no limitations on whether or not facts or whether or not records as a whole can refer to sources regardless of where they are in a hierarchy. Thus DeadEnds does not enforce any restrictions on where sources can be so none of the limitations you guess may exist really do exist. If you check the DeadEnds model you will see that every person record can refer to sub-persons, that every person can refer to a source (in fact any number of sources), and that every fact or other attribute within a person can refer to a source (in fact any number of sources). Every, every.

I put no such restrictions in DeadEnds because I want it to be usable in any context where person records are involved. I personally use the model only in the context where the lowest level person records, the personae, come from individual records, and only in the context where higher level person records represent conclusions about the person records further below. In this context, you can draw up obvious rules as to where sources would be and what they would mean. And these are the rules you refer to. But if others want to use that model in different ways, where, for example, there is only one layer of person records, or where, for another example, there are two levels, but the bottom ones don't have to come from records (this is exactly the New Family Tree situation), then the sources could be used in completely different ways.

I stress the utility of the multi-tier person trees, with the persona records at the bottom, with the sources arranged they way I describe, because this is how the structures would be built it you were following a very scientific approach to your genealogy, collecting your facts, arranging and reasoning about your facts, and then building up your conclusions. I hope by now it is pretty clear how the structures I promote provide a good framework for genealogical research. All of the methods taught by all the teachers and organizations follow this scientific approach and my multi-tiered structures provides a means to organize data on a computer that fits the process. But it is not required and there are no restrictions in the DeadEnds model to force it to be.

ttwetmore 2011-06-11T07:18:52-07:00

"About the Biography world. I am not sure I understand the benefits of this separation into worlds, it seems like the model is more important than practical considerations. I am not sure I understand what the Biography world is. Since my main purpose of entering data in a genealogy program is to be able to produce reports and charts, I have had focus on that need all the time. Is the suggestion that we have to record the information more than once? We may also end up with a “Research process world” where at least the reasoning could be recorded, and the reasoning could end up in the Conclusion world, and then we have to record it a third time in the Biography world. That is a lot of recording."

The benefits of separation come in better understanding the nature and reasons for the records in your database.

My purpose in introducing the Biography world was to talk about types of data we all want in our databases that don't come directly from either the evidence we find or from the conclusions we make. It is the extra information we want associated with our conclusion persons that we want to see in our reports. In my current LifeLines database I put this information under special GEDCOM tags so that report programs can find the information and treat it specially in reports.

There is no duplication of data in a well-designed model, and nothing I have proposed in any of my models causes duplication. I implement this stuff in software. If there were any duplication I would be the first to see it and remove it.

ttwetmore 2011-05-30T12:59:53-07:00

Issue 1: I am hung up with the word citation. To me a citation is a string that describes where in a source an item of evidence comes from. This is the standard definition; even ESM says so. I now accept that Geir's Citation record is EXTRA information (that is, information beyond the pure author/title/publisher stuff) that gives additional information about the EVIDENCE WITHIN THE SOURCE (e.g., page number, researcher's reaction to the evidence, synopsis of evidence, researcher's judgement of quality, researcher's thoughts and maybe early conclusions about the evidence, and so on, basically ANYTHING the genealogist would like to record about the evidence except the genealogical information itself that is in the evidence).

Okay, now a level-2 or level-n CP is, I think from what Geir has written, a Conclusion Person. Therefore there is no evidence anywhere in any source that can be used as a Citation. Yeah, the level-1 Persons making up the level-n Persons have Citations, because they do come straight from evidence, but all the level-2+ Persons come from conclusions made by the researcher. So I ask myself, what would a Citation be in that context? Well, I can imagine that it is the conclusion itself, maybe the proof statement, maybe a sentence that justifies the conclusion. If this is the case then I feel that calling it a Citation, though a little awkward, is okay, because, in one form or another, that conclusion should show up as a footnote or in a bibliographic context.

My real question is, what is a Citation when pointed to by a level-2+ Person record?

Issue 2: When multi-levels are collapsed, the Citations from the level-1 Persons must come up into the final Person, but be attached to the individual facts that are brought up, not to the whole Person. The Citations in the highest level Person should remain intact at the whole-Person level. I'm afraid that all the intermediate level Citations are much more difficult to handle and it might be better just to get rid of them.

Example: We have 4 level-1 Persons, joined as two pairs into to 2 level-2 Persons, joined together into a single "root" level-3 Person. We start then with 7 person records, and each has a Citation. Four of those Citations are to evidence and 3 of those Citations are to conclusions. Now saw we collapse that 3 level (7 record) Person into a level-1 (1 record) Person for a level-1 system genealogical application. We SQUASH all the facts together into a single record. Each fact brings along the Citation that was in the original level-1 Person that the fact came from. The Citation in the 2 level-2 Persons are removed. The Citation in the original level-3 Person is kept as the Citation for the new level-1 record as a whole. I think this is also what Geir is saying above.

Tom Wetmore

gthorud 2011-06-02T16:02:34-07:00

Re. Issue 1:
I have to repeat some of the views I have expressed elsewhere to get it into this context.

To me a citation is something that ends up as inline text, a footnote or an end note. Whether it contains evidence or conclusions is not a concern of mine. In the real world people will put anything they like in it, so I don’t see any point in discussing if it is evidence or conclusions, and I don’t see any problem in having citations containing anything for conclusion persons. I don’t believe in redesigning the way people work to fit a multilevel E&C model. This has been argued in many discussions in the previous months.

Also, conclusions about a person may be based on information that it would not be appropriate to record in an EP for that person, it may be some argument that is not directly related to that person, but never the less relevant as evidence supporting a conclusion.

An aspect not mentioned before:

If you import a single level file, the persons will be conclusion persons since they in most cases will cite many sources – they are not Personas or EPs. You cannot place restrictions on citations in this case. I don’t think I would like to have two sets of user interface controlling the content of Person records, and it will not be very user friendly.

ttwetmore 2011-06-02T16:55:21-07:00

Geir,

What for you is a citation is not what a citation is. No wonder it was hard to figure out what you meant -- but I did figure it out yesterday when I realized that the best thing to call it would be a Footnote -- it might even be better to call it a Proto Footnote! If you redefine words you can expect nothing but more confusion from the rest of us. Please read a few definitions of the word citation from a few places. You are defining a Citation to be "anything at all that a genealogist would like to say about a source, a bunch of sources, a conclusion, his process of reasoning, or anything at all of interest." That is most definitely not what a citation is. What you are talking about is mostly just a free note, what some here have called a reference note. I have nothing against this thing that you are calling a Citation; it's an important concept, so I'm not rejecting it at all. BUT IT IS NOT A CITATION, and you should stop calling it a citation.

You talk about not wanting to change how people work to fit a multilevel E&C model, I think with the implementation that your model allows this to happen. I call you out on that statement. How much do you really know about how people work in order to make the judgement that your approach is right? I would claim just as loudly that the DeadEnds model not only allows users to continue to use E&C they way they want to, but is also an improvement over much of what is possible today. And I probably don't have more data to support my statement as you have to support yours!!

You talk about importing a single level file and you say you can't put restrictions on citations in this case. I call you out on that statement also. Of course you can. If the file is GEDCOM, there are no Citations to speak of at all, just pointers to Source records at best. If the single level file is a Better GEDCOM file, its restrictions on Citations will be exactly what we allow in the standard.

I agree that conclusions about persons may be based on information not appropriate to record in an EP. You include that information in your Conclusion record that goes with the Conclusion Person.

Tom

GeneJ 2011-06-03T17:55:44-07:00

@Tom,

In the context of E&C, we need to move beyond the discussion of what a citation is or is not.

A citation in the real world is more than "a string that describes where in a source an item of evidence comes from."

I'm not sure why anyone things Elizabeth Shown Mills thinks a citation is "a string ..." Please see the page link below, in particular the opening paragraph, for the Mills quote from EE (2007), "As researchers, we should continue to evaluate the credibility of every source against new evidence. To do that, our research notes must do more than merely name a source and cite its location. Our notes should also describe the source in sufficient detail that we, at any future point, can reconsider our evaluation. As writers, we owe our readers that same description, so that they can better assess the soundness of our judgment. ... The citation principles and models presented in the chapters that follow are crafted around these three essentials: evaluation, identification, and description." [p. 38]

http://bettergedcom.wikispaces.com/Mills+550+-+_Evidence+Explained_+and+related+field+descriptions

Repeating, "citation principles ... three essentials ... evaluation, identification, and description."

If we need to continue discussions about citations can we please hold those discussion on the page, err... "About Citations?"

http://bettergedcom.wikispaces.com/About+Citations#About%20citations

Thank you. --GJ

ttwetmore 2011-06-03T18:16:37-07:00

When a computer scientist says "string" he/she simply means text, something made up of characters.

gthorud 2011-05-30T12:35:49-07:00

Linking from a Lookup record to codified data in Person and Event records

This is based on the document presented in http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416

The document proposes to link from a Lookup record to all person records created as a result of the lookup.

A better way than linking only from Lookup to Persons may be to link from Lookup to Event common data records (indirectly pointing to the persons in the events), and if all persons resulting from the lookup are not pointed to via those events, also point directly to the remaining persons – those where you have recorded info that is not held in an event (e.g. person note). You can for example list the info found in a lookup by listing primarily events (with persons), and then the remaining persons – or if you like the other way around. This will allow recording of the sequence of events and persons that could reflect the sequence in the logic behind the codification. For example recording the content of a census record, you could start with the census event (also listing the persons), followed by birth and residence events for the persons.

ttwetmore 2011-05-31T10:01:57-07:00

Geir says, "The document proposes to link from a Lookup record to all person records created as a result of the lookup."

I've said before that in my opinion these links are going the wrong way.

Think of this as a data flow problem. You are conducting a search. You find a source with evidence you are interested in. You record information about the source in a Source record. (I do not understand the nuances of the differences between a Source record and a Source Lookup record).

Now you decide which items of evidence in the Source you wish to record more fully. You do this by adding more info either inside the body of the Source record itself (see Geir's picture), or as codified Person (Persona) and/or Event records. (You don't have to process all the evidence from the sources at the same time; after all it might be a very big source, like a book with lots of info, and you might pick and choose the evidence you want to treat at different times).

Consider the links you need when codifying a Person or Event record. You create the record and then (I humbly submit) you should link it back to the Source record, meaning, "I am evidence and I came from that source." Now in the DeadEnds model I do that by put putting a SourceReference structure inside the Person record. The SourceReference structure has a pointer to the Source record, but it also allows any number of other fields that you can use to put in "citation" information. I have accepted the fact that Better GEDCOM is heading in the direction of treating this citation information as more important than I do, so Better GEDCOM might create a separate Citation record. So in the Better GEDCOM case, after creating the Person record, part of the process of linking that Person back to its Source should involve the creation of a Citation record. At the end of this process the Person (or Event) record points to the Citation record which points to the Source record.

I get the feeling that this Person->Citation->Source "architecture" is really quite different than the Gier architecture. I guess instead of me making comments on why I think the Geir architecture is too complex (having too many records to begin with) and backwards (in terms of the directions of the pointers), I would like to hear some comments as to why the DeadEnds architecture is not better, simpler, and easier to understand.

(At some point I would like to understand what a Lookup record is.)

Tom

gthorud 2011-05-31T17:57:47-07:00

Event role and two way linking of records

In my original document http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416 I have placed the the role, the reference to "Event common data" etc in the Person record.

About two thirds down in this posting http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416#39726236 - under the heading "Event Common Data Record", Tom argues for placing the role in the Event (common data) record, and have a reference to the person record together with the role - in addition to the reference from the person record. This is the same as in the DeadEnds XML document.

I had expected an application to create a reference from the event to the relevant part of the person record when importing the file, so the difference is not big. Either way would work.

One difference is, do we want to carry an extra UUID for each person<->event relation or not? One question is how big the total overhead will be. I assume the UUIDs used for relations can be droped by a program internally and recreated on export if storage is a concern. A database would most likely not use UUIDs as internal keys.

But if we go for this, it would most likely mean having UUIDs (and perhaps more, see below) both ways for all relations. I would like to hear if anyone else has a view on this.

Also, wrt the Person<->Event relation in particular. DeadEnds has personRoleType and eventRoleType - am I right in assuming these would have the same value for a given relation? Why duplicate? Do we also need to duplicate the rest of my Event participation structure - hope not. Isn't the UUID sufficient?

Also, there are evenRoleRefferenceAttribute and personRoleReferenceAttribute in DeadEnds- what is the purpose of these? Are they also duplicates?

ttwetmore 2011-05-31T19:45:03-07:00

G > In my original document http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416 I have placed the the role, the reference to "Event common data" etc in the Person record.

G > About two thirds down in this posting http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/39358416#39726236 - under the heading "Event Common Data Record", Tom argues for placing the role in the Event (common data) record, and have a reference to the person record together with the role - in addition to the reference from the person record. This is the same as in the DeadEnds XML document.

G > I had expected an application to create a reference from the event to the relevant part of the person record when importing the file, so the difference is not big. Either way would work.

Yes, some comments below.

G > One difference is, do we want to carry an extra UUID for each person<->event relation or not? One question is how big the total overhead will be. I assume the UUIDs used for relations can be droped by a program internally and recreated on export if storage is a concern. A database would most likely not use UUIDs as internal keys.

In my opinion only records have UUIDs, so each person record has one and each event record has one. The reason I suggested that the record IDs be UUIDs is so that when records from one database are imported to another we can guarantee that there will be no identifer clash. Today, when you import a GEDCOM file into an application, there is a reasonbly complex step of changing many IDs in the imported records so they don't clash with the IDs that are already in the database.

G > But if we go for this, it would most likely mean having UUIDs (and perhaps more, see below) both ways for all relations. I would like to hear if anyone else has a view on this.

Again, my view is that UUIDs are record IDs only. How could you make a UUID a property of a relation? Why would you make a UUID a property of a relation? Do you think that there should be Relation records?

G > Also, wrt the Person<->Event relation in particular. DeadEnds has personRoleType and eventRoleType - am I right in assuming these would have the same value for a given relation? Why duplicate?

Yes, they'd be the same. Duplication not necessary, but makes reading the files easier.

G > Do we also need to duplicate the rest of my Event participation structure - hope not. Isn't the UUID sufficient?

No duplication!

G > Also, there are evenRoleRefferenceAttribute and personRoleReferenceAttribute in DeadEnds- what is the purpose of these? Are they also duplicates?

Here's the priciple I'm after -- the Event record gets all the information about the Event and the Person record gets all the information about the Person. But, there is a big context issue very often. Very often the information about the Person is information that is ONLY TRUE IN THE CONTEXT OF THE EVENT. Two great examples are the person's AGE and the person's OCCUPATION. These are properties of the Person, but they are not "intrinsic" to the person, only "transitory" or "context sensitive" with respect to the Event.

Let's do an example to show this. You find this evidence, "In 1876, Daniel Wetmore, aged 53, lived in Norwich, Connecticut."

From this you codify one event record and one person record:

person: {id: i1; name: "Daniel Wetmore; role: {type: primary; event: e1; age: 53 years}; source: sx}

event: {id: e1; type: reside; date: 1876; place: "Norwich, Connecticut"; role: {type: primary; person: p1}; source: sx}

Note where the age information is. It isn't a direct attribute of the person. It is an attribute of the role reference. This is what a personRoleReferenceAttribute is, a non-intrinsic attribute of a Peron in the context of the event.

Note, that in some sense, a purist could argue that even the Person's name is a non-intrisic attribute of the person, so say it too should be pushed down into the roleReference. So some might argue that the person record should look like this:

person: {id: i1; role: {type: primary; event: e1; name: "Daniel Wetmore; age: 53 years}; source: sx}

I am not in that set of purists.

Let me be clear in this example. "primary" is the personRoleType and the eventRoleType. The structures starting with the key "role" are the entire personRoleReference in the Event record and the eventRoleReference in the Person Record. And finally the age value in the Person record is an eventRoleReferenceAttribute.

You could use a different rule, and put the person's age as a personRoleReferenceAttribute in the event's personRoleReference structure. Or you could duplicate it both places. My common sense says that this age is most naturally thought of as a property of a person, it sould be somewhere in the Person's record.

Also note that I did not add the birth vital event to the Person. In a real codification of this evidence I would have done that, that is, the person record would look like this:

person: {id: i1; name: "Daniel Wetmore; role: {type: primary; event: e1; age: 53 years}; vital: {type: birth; date: "about 1823"; note: "computed from age at residence"}; source: sx}

Maybe this is easier to see in GEDCOM like syntax

0 @I1@ INDI
1 NAME Daniel Wetmore
1 ROLE primary
2 EVEN @E1@
2 AGE 53 years
1 BIRT
2 DATE about 1823
2 NOTE Computed from age at residence event.
1 SOUR @SX"

Or you might, in GEDCOM, prefer:

0 @I1@ INDI
1 NAME Daniel Wetmore
1 EVEN @E1@
2 ROLE primary
2 AGE 53 years
1 BIRT
2 DATE about 1823
2 NOTE Computed from age at residence event.
1 SOUR @SX"

ttwetmore 2011-06-01T11:38:52-07:00

Footnotes Are Not Citations

I finally understand where my confusion vis a vis Geir's Citation and Source Lookup records come from!!!

I just read the Better GEDCOM page about Citations and realized that many of the "citations" given as examples on that page are not Citations!! They are simply conventional Footnotes.

Here are two standard definitions taken from a dictionary.

Footnote -- An ancillary piece of information printed at the bottom of a page.

Citation -- A quotation from or reference to a book, paper, or author, esp. in a scholarly work.

It is certainly correct to say that a Citation must be able to appear as a Footnote (it is one of the ESM approved formats for Citations), but it is incorrect to call anything that could be a Footnote a Citation. Somehow we got caught up in an old logical trap here. I now understand that Geir's model is forcing anything that could appear as a Footnote to be called a Citation.

No wonder I was so confused. I finally understand what Geir's Source Lookup Record really is -- IT IS the REAL Citation record, and Geir's Citation record is really an attempt to be a more general purpose Footnote Record!! It all falls into place.

We could define a Better GEDCOM Citation this way, but it is unconventional, it causes great confusion (at least it caused me great confusion), and it redefines very well understood terms. I don't think we want to do this.

A Footnote is an almost totally free format item that has to be tied to some point in the text of a document so it gets printed on the same page. I don't know how you could put any real constraints on the format of a Footnote at -- after all the whole purpose of a Footnote is hold whatever additional information the author thinks might be helpful when reading the text keyed to it -- there are NO RULES about what an author can put in a Footnote.

If we are thinking of Better GEDCOM as supporting a fully blown document generation system, we have a lot further to go in figuring this out. We would have to formalize a way of keeping special kinds of notes in person records, notes that are intended to be parts of generated documents. If we had these we could also have Footnote Records that these special notes could refer to.

Frankly I've worried about this exact problem many times in the past, and over 20 years ago I wrote a neat program that could generate great genealogical text. In my current LifeLines program I have a way of generating text. I use the PARA tag and the SENT tags (paragraph and sentence) within my INDI records, and I have LifeLines report programs that capture the text under these tags and format them into a generated document.

I think this is a fruitful area to discuss, but it seems to me that we are still pretty far away from that point.

Tom

ttwetmore 2011-06-01T14:05:57-07:00

Adrian,

Thank you.

Tom

AdrianB38 2011-06-01T14:08:28-07:00

Now, before various people immediately point me to examples of documents where the footnote style citations refer to several sources - yes I know. I'm just reading a book on the Brus family of 1100-1300 (the family that gave rise to Robert the Bruce, King of Scots and to my surname) and - in the middle of trying to keep on top of feudal practices - I find my attention distracted by analysing the author's means of citing her sources. And yes, the footnotes do mention multiple sources, just as Gene's footnotes in http://bettergedcom.wikispaces.com/About+Citations do the same thing.

So the footnote referring to multiple sources has a long and honourable history in scholarly writing.

But now can we be practical?

As Tom suggests, the author of my Brus book, wrote those footnotes in free format - albeit probably pasting in pre-prepared citations. But for those of us using a genealogy app, we either try to fit those footnotes into what are (currently) GEDCOM style citations or we try something new.

It IS possible to fit those footnotes into what are currently GEDCOM style citations - we simply use the note that is linked to the GEDCOM style citation and MANUALLY write in the text of the footnote, including MANUALLY writing in the ESM style citation. For those of you with a black-belt in CSM format citations - no problem. But we are continually hearing from American genealogy how poor those without a black belt are in doing citations. So this idea doesn't really help the general populace. And in any case, the manually written citations are not visible to the application as anything other than free-format text, so the application isn't going to help you at all with those sources.

So, _if_ we are to improve things, we can't use the GEDCOM-style citation to refer to multiple sources - either through explicitly referring to them or through free-format text. We would need to concoct something new. BUT - frankly - should we be doing it? A multiple source footnote is, as I said, something of honourable history in documents. But it's about the _format_ of the _document_, not the analysis of the data. I seriously do not want to spend time designing bits of BG to format documents in a specific manner when there is so much that we have not touched on with respect to the analysis of the raw data. For instance - the analysis of the ESM style citation formats? (Single source formats!)

It might be possible to create multiple source footnotes in the application by optionally concatenating consecutive footnotes for the same value - that's all I can think of, at the moment. This could be done in the app without altering the BG data structure.

In summary, let's confine BG to genealogy not desk-top publishing?

GeneJ 2011-06-01T16:02:31-07:00

Footnotes are one form that citations may take.

Citations, per our BetterGEDCOM definitions, are (from _Evidence Explained_, 2007; electronic version, p. 820) "statement in which one identifies the source of an assertion. Common forms of citations are source list entries (bibliographic entries), reference notes (endnotes or footnotes), and document labels."
http://bettergedcom.wikispaces.com/Glossary+Of+Terms

For more common use of the terminology, see _Wikipedia_, "Citation." Since the introduction refers to more the scientific style, skip down to "Citation Systems," for "Note Systems." From the source, "Note systems involve the use of sequential numbers in the text which refer to either footnotes (notes at the end of the page) or endnotes (a note on a separate page at the end of the paper) which gives the source detail." One or more examples are shown.
http://en.wikipedia.org/wiki/Citation

For those otherwise inclined, see, Chicago Manual of Style, see "14.2, Chicago's two systems of source citation," for "This chapter describes the first of Chicago’s two systems of documentation, which uses a system of notes, whether footnotes or endnotes or both, and usually a bibliography. The notes allow space for unusual types of sources as well as for commentary on the sources cited, making this system extremely flexible. Because of this flexibility, the notes and bibliography system is preferred by many writers in literature, history, and the arts. Chicago’s other system—which uses parenthetical author-date references and a corresponding reference list as described in chapter 15—is nearly identical in content but differs in form. The author-date system is preferred for many publications in the sciences and social sciences but may be adapted for any work, sometimes with the addition of footnotes or endnotes. For journals, the choice between systems is likely to have been made long ago; anyone writing for a journal should consult the specific journal’s instructions to authors (and see 14.3)."

Before trying to help about multiple source citations/footnotes/endnotes, do you want me to comment in the discussion about E&C, or should it be somewhere else?)

gthorud 2011-06-01T16:25:48-07:00

No one has in recent discussions claimed that everything that can be in a footnote is a citation. But, since we have most of the mechanics required to produce a non-citation footnote in place for citation footnotes, the structures for citations should also be used for that. Indeed the mechanics for a footnote is a subset of the functionality for a citation which ends up in a footnote. You might consider calling the record a footnote record rather than a citation, but then the citations can also end up as endnotes. And calling the record a note record, as some citation documents do, would cause total confusion.

Using the citation record for a non-citation foot/end note has nothing to do with the Lookup record, because you would not cite any sources in a non-citation foot/end-note. So the non-citation footnote is not the central thing in this.

I think the proper order of issues, when looking at the possibility of having e.g. several events in a person record referring to the same citation is to look at how you would merge EPs into CP when collapsing a multi-level structure. Even then, you could create several identical copies of the citation, but that could produce a LOT of footnotes in a report. So, there are practical considerations as well, having long lists of “ibids” is not nice reading, and it is a waste of paper.

An alternative solution to having a citation refer to several sources, but still produce a foot/end-note with several cited sources and reasoning, would be to include an indication that a set of one-source citations, and non-citation footnote text should be merged in one footnote, and in what order/sequence. Multiple citations merged into one foot/end-note, separated by semicolons, is something that is supported by current genealogy programs and the style manuals. One rule would apply, such citations merged in this way shall not be merged with other citations/non-citation footnotes.

Gene, comments on other E&C stuff should preferrably be posted in the relevant discussion - if there is one - and possibly refered to in this discussion.

ttwetmore 2011-06-01T18:59:30-07:00

Geir,

I'm not going to respond to you last note, as I think you are making things more and more complicated, and I would rather concentrate on how simple I think things really are. So Instead I will soon be posting another note that tries to put my spin on things.

Thanks,

Tom

ttwetmore 2011-06-01T19:17:18-07:00

And here is that other note.

Things are going astray. I believe some fundmental concepts are being lost. And I think Geir's model is contribuing to the confusion. We are mixing things up and therefore taking a simple picture and making it almost completely obfuscated.

In the Evidence-world we (should in my humble opinion) have:

Sources containing genealogical information. We have Citation that describe where in the Sources we found that information. WE MAY HAVE codified some information from the Sources that allows us to put the genealogical information into convenient Person or Event record form for further processing. In my DeadEnds version of the world this is done by having Source records and then Person and Event records that codify information from the Sources and then refer to those Source records. I put the citation information into the references from the Persons and Events to the Sources. I have accepted the utility of having another record, the Citation record, so that a codified record can point to its SINGLE Citation record that then points to its SINGLE Source record. The Citation record specifies where in the Source the information codified into the Person record came from, AND it can have a few other things, such as researcher's comments, synopses, a string that should be shown when the citation is used as a bibliographic entry, and so forth. This concept of a Citation seems to me to be in full agreement with the conventional genealogical citation concept as described by ESM, as defined by the "Chicago Manual of Style" etc.

Then we have the Conclusion-world:

The main part of the Conclusion-world are the Person records that represent real people. If we have decided to codify Person records from the Sources, then we have two kinds of Person records in our database, those described above in the Evidence-world and these new ones. If we are lucky enough to have a system that supports codified Persons, then we seem to have decided to call those Persona records to avoid confusion. If we have decided not to codify then the Conclusion Person records are the only Person records in our database. Whether we have Personas or not, a Conclusion Person record must represent the collection of PFACTS we have gathered together from different Sources that we believe all apply to the same real Person. If we don't have Persona records, then each PFACT in the Person record will point to its own Source through a Citation record, that is, the Person record will point out to many Citations and thence to many Sources. In a Persona-rich world we would simply build a 2-tier tree with the Person record at the root and the Personas below, with each Persona still pointing to its own Citation and thence to a Source as described above. In both systems there are the same number of Citations pointing to the same number of Sources.

Okay, get ready, get set. In both the Persona-less and Persona-rich worlds we have Citations and Sources for each of the individual PFACTS, whether those PFACTS are squashed together into a 1-tier Conclusion Person or separated into Persona records in a 2-tier system. But in neither world do we yet have a METHOD TO DESCRIBE WHY WE THOUGHT THOSE PFACTS SHOULD HAVE BEEN BROUGHT TOGETHER INTO THAT ONE PERSON RECORD. We brought them togther. We must have come to some conclusion, for some reason, that we thought justified bringing all the PFACTS together into a single Conclusion Person. SO WE MUST JUSTIFY IN SOME WAY THE BRINGING TOGETHER OF THE PFACTS. We have to have something that is analgous to the Citation and Source in the Evidence-world, in the Conclusion-world. We need the Conclusion Person to have something similar to a Citation that describes why we think that Person is real. It isn't a Citation because it doesn't refer to a single Source or necessarily any Source at all. But it must hold our conclusion in some form, and it sure would be nice if that form could look real nice as a Footnote. In the DeadEnds world I solve this problem by generalizing the notion of a Source to include "MY BRAIN." That is, a source is anything that can provide information. It was my brain that provided the information, in the form of my reasoning and conclusioning, that created the Person record. Now in DeadEnds remember there was no Citation record, just a structured pointer to a Source. In DeadEnds I put the "proof statement" or whatever else other text or information I thought should go into that justification in that special Source record. But if you don't like that idea, and if we want something analgous to the Citation record for the Conclusion-world, we could call it, by gum, a Conclusion record. Have the Person record point to a Conclusion record and put all the proof-statements, text description, and the like in there. The Conclusion record can stop there; it doesn't have to point anywhere, unless you really do want to put a Source in your database and label it "MY BRAIN." And be sure you can get a nice Footnote out of the Conclusion record.

As a benefit, I think you now have EVERYTHING YOU NEED FOR GENERATING YOUR Footnotes, Endnotes and Bibliographic Entries. You have all the Citations and you have all the Conclusions. If you are generating a document somehow, and that document mentions specific Persons, you can easily generate all the Citations and Conclusions that apply to those Persons as Footnotes, Endnotes, or Bibliographic entries.

Tom

AdrianB38 2011-06-02T08:34:48-07:00

Gene
"Footnotes are one form that citations may take". Indeed. However, the way I read those things that you reference, a citation always refers to a source in the singular. The issue here is a footnote that contains multiple source references, which by my interpretation of those definitions, means it's a footnote that contains several citations.

My own feeling on this is that we are mixing up 2 concepts here - the source of a fact and how we want it printed. Data-content and Data-presentation. And frankly I'd rather not stick the presentation aspect bang in the middle of a data-content relationship between "fact" and source.

Geir - I think your comment about an "indication that a set of one-source citations, and non-citation footnote text should be merged in one footnote, and in what order/sequence" is much more sensible - though I don't know whether you mean an indication in the BG standard or what.

(As an aside, one reason that I can't get enthusiastic about codifying multi-source footnotes in BG is that what's the point in having nicely sophisticated footnotes when the basic text of a report is in such appalling computer speak?)

AdrianB38 2011-06-02T09:47:35-07:00

Tom & co - my take on the Data Model re Evidence etc has just been loaded at the foot of
http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS

It is lacking in some areas, I'm sure, but seems to relate to what is being discussed here - though naturally is different.

GeneJ 2011-06-02T11:15:23-07:00

@Adrian,

The passages referenced in my earlier post were intended only as a response to the title of this discussion topic, "Footnotes are not citations." I specifically didn't want to comment on "Defining E&C..." about "multiple" sources in a "reference note."

The concept of more than one source being referenced in a note seems less a cause and more the effect of "codifing" particular "evidence" (as that term is used in E&C) and moving beyond "I think this is the same person because," etc.

Sooo ... isn't the real issue here vested in other discussions about E&C over the past month or so. --GJ

P.S. Adrian wrote, "... either try to fit those footnotes into what are (currently) GEDCOM style citations or we try something new." Seems to me this is better discussed in different area at another time. The various software programs facilitate different approaches. We all know there are lots of users who'd like to see BetterGEDCOM be more about advancing the sharing of citations--as those take the form of reference notes and source list entries.

AdrianB38 2011-06-02T12:24:37-07:00

"The concept of more than one source being referenced in a note seems less a cause and more the effect of 'codifying' particular 'evidence'" Mmm. Not sure I totally agree with that - the Brus book has plenty of examples of multiple sources in a footnote and that's gone nowhere near a GEDCOM file. Though it did seem to pop up here as a result of the multiple levels of evidence & conclusions.

"We all know there are lots of users who'd like to see BetterGEDCOM be more about advancing the sharing of citations"
I'd also like to see BG advance the sharing of citation data - so if anyone else can help with the analysis of the structures of citations, we'd all be grateful.

GeneJ 2011-06-02T12:43:02-07:00

@Adrian ...

About the Brus book, see item no 6 in the discussion (link below) for comments on finalizing a genealogy:
http://bettergedcom.wikispaces.com/message/view/About+Citations/39827844

The sooner we can get E&C defined, etc., the sooner we can once again focus on citations.

gthorud 2011-06-02T15:35:11-07:00

I agree that the issue of can be handled in the Citation work, it is not really a central issue of E&C - the important issue here is that one way to find data from the source in various variants is to go through structures holding citation data.

AdrianB38 2011-06-01T13:30:16-07:00

For once I am in agreement with those who wish to separate Citation (in the dictionary sense of a full indication of where to find stuff) from Citation (in the GEDCOM sense of indicating which source to find justification for a value in, where within, and by the way here's some comments about this combination). My reason is that it would make this discussion a bit easier to read.

Anyway - bearing in mind the terminology may get tricky... The GEDCOM-citation has in fact a very sound basis in data modelling. There is a many to many relationship between a genealogical value (i.e. an event or attribute value or a person's existence or...) and a source record. A source record may be cited by 1 or more values. A value may cite 1 or more source-records. Any data modeller who's got past the first week of Data Modelling 101 will resolve that many-to-many relationship with a 1-to-many to something in the middle and a many-to-1 from that thing in the middle to the other original. There won't be any arguing - they'll do it.

The bit-in-the-middle resolving the many-to-many between genealogical value and a source record is, in the GEDCOM data model, a citation - GEDCOM style. Thus one genealogical value may be justified by 1 or more GEDCOM-citations and one source-record may be referred to by many GEDCOM-citations.

But note that a single GEDCOM-citation must only refer to one genealogical value and a single GEDCOM-citation must only refer to one source-record.

Apologies to those of you who have done Data Modelling 101.

Now, if we are talking about citations referring to multiple sources, it is totally clear from the above that a GEDCOM-citation cannot do anything of the sort.

If you are immediately saying that a BetterGEDCOM citation should be able to refer to multiple sources, then this is a huge leap away from the GEDCOM-citation and frankly - let's be quite clear about this - 90% of the software guys will have got their Data Modelling 101 hats so firmly on their heads that they'll either ignore the requirement and just implement it as one source or they'll use it to prove that BetterGEDCOM can't even normalise a many-to-many so what possible use are they?

So please - can we _not_ go down the route of having anything that looks anything like a citation sitting between source and value where the citation-sort-of-thing refers to more than 1 source???? It'll be a disaster if we do....

Brief working description of E&C Model (concept)

Week 1 - Progress

Links

Comments

BetterGedcom E&C Specification Wiki Pages Organization

Official Specification Discussions

Proposals

Case Studies